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FOREWORD 

The  MATH/CHEM/COMP  (MCC)  meetings  have  started  at  the  Inter-University  Centre  in 
Dubrovnik  in  1986  and  soon  have  become  a  widely  recognized  forum  to  exchange  the  latest  ideas 
on  combinatorial,  topological  and  computational  aspects  of  chemistry  and  physics.  The  above  topics 
have  been  presented  through  three  out  of  six  days  of  the  MCC'99,  namely  at  Sessions  on  Theory 
and  Applications  of  Mathematics  in  Chemistry,  Computational  and  Experimental  Chemistry,  and 
Computations  in  Chemistry  and  Physics.  In  order  to  keep  the  pace  with  the  recent  subjects  of  wider 
interest  to  chemical  and  pharmaceutical  community  the  Session  of  Drug  and  Vaccine  Modelling  has 
been  organized,  as  well  as  the  Session  on  Combinatorial  Chemistry  and  Combinatorics  in 
Chemistry.  Beside  that,  the  third  day  of  the  meeting  has  accommodated  the  Fifth  Croatian  Meeting 
on  Fullerenes. 

These  new  trends,  as  well  as  traditional  topics  of  the  meeting  are  partially  reflected  in  the  present 
issue  of  the  Journal  of  Chemical  Information  and  Computer  Sciences.  Other  papers  from  the 
meeting  will  appear  as  the  special  issues  of  journals  Croatica  Chemica  Acta  and  Fullerene  Science 
and  Technology. 

The  MCC  meetings  encompass  also  courses  usually  held  in  the  afternoon  sessions,  but  the 
related  materials  are  distributed  only  locally.  However,  the  conference  contributions  are  regularly 
published,  which  up  to  now  have  resulted  in  15  special  issues  in  various  international  journals.  The 
papers  presented  at  the  MCC'97  have  also  appeared  in  this  journal  (and  have  been  solicited  by  Ante 
Graovac,  Dejan  Plavsic  and  Drazen  Vikic-Topic),  but  remained  scattered  through  its  issues  of  the 
year  1998.  This  time  we  have  collected  them  in  one  issue,  and  special  thanks  for  that  go  to  Prof 

Milne,  editor  of  the  Journal  of  Chemical  Information  and  Computer  Sciences. 
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Revised  Manuscript 

On  the  Relation  between  W'/W  Index,  Hyper-Wiener  Index 
and  Wiener  Number 

*.a  a  b 

Dejan  Plavsic  >  Nella  Lers  .  and  Katica  Sertic-Bionda 

^The  Ruder  Boskovic  Institute,  P.O.B.  1016,  HR-10001  Zagreb, 

The  Republic  of  Croatia 

'^Faculty  of  Chemical  Engineering  and  Technology,  University  of  Zagreb, 
HR-10000  Zagreb. The  Republic  of  Croatia 

It  is  shown  analytically  that  the  w'/W  index,  the  hyper-Wiener  index  and 
the  Wiener  number  are  closely  related  graph-theoretical  invariants  for 
acyclic  structures.  A  general  analytical  expression  for  the  hyper-Wiener 
index  of  a  tree  is  derived  too. 


1.  INTRODUCTION 


Properties  of  a  molecule  are  a  consequence  of  a  complicated  interplay  of  its 
topology  (atomic  connectivity),  metric  characteristics  (bond  lengths,  valence 
and  torsion  angles)  as  also  detailed  dynamics  of  electrons  and  nuclei.''  Within 
many  classes  of  compounds,  the  variations  of  molecular  metric  and  electronic 
structure  are  small.  Hence,  changes  of  many  of  molecular  properties  in  these 
classes  may  be  considered  as  only  topology  conditioned.^"®  Molecular  topology 
can  be  represented  by  a  (molecular)  graph  being  abstract,  essentially  non- 
numerical  mathematical  object.^®'^^  In  order  to  perform  quantitative  topology- 
property/activity  studies  of  molecules  it  is  necessary  to  quantify  the  structural 
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information  contained  in  the  corresponding  graphs.  The  characterization  of  a 
graph  is  usually  carried  out  by  means  of  graph  invariants^ °  (topological 
indices^®). 

The  representation  of  a  molecule  by  a  topological  index  entails  a 
considerable  loss  of  information  concerning  the  molecular  structure.  Hence, 
chemists  are  permanently  in  pursuit  of  novel  topological  indices  which  would 
improve  the  graph-theoretical  characterization  of  molecular  structure  and 
enable  to  make  better  and  easily  interpretable  regression  models  of  topology- 
property/activity  relationships.  In  the  framework  of  this  effort  Randic  has 
recently  put  forward  a  novel  bond  additive  molecular  descriptor,  the  W'/W 
index."'^-''®  The  index  is  the  sum  of  graphical  bond  orders^^-''®  of  all  edges  in  a 
graph  calculated  by  means  of  the  Wiener  number.’’®  He  also  tested  the  W'/W 
index  in  the  framework  of  the  single  variable  linear  regression  model  by 
examining  the  van  der  Waals  areas  of  heptanes’’ and  some  20  molecular 
properties  of  octane  isomers. The  close  resemblance  in  quality  between  the 
regressions  with  the  W/W  index  as  predictor  variable  and  regressions  based  on 
the  hyper-Wiener  index^®  as  well  as  on  the  Wiener  number  indicates  that  these 
indices  encode  the  very  similar  information  on  topology  of  acyclic  structures. 
We  have  investigated  the  intercorrelation  of  these  three  indices  on  heptane, 
octane  and  nonane  isomers  and  found  in  all  cases  high  value  of  the  coefficient 
of  determination  (r^>0.996)  and  strong  linear  intercorrelation.  The  plot  of  the 
W'/W  index  versus  the  hyper-Wiener  index  R  for  nonanes  is  illustrated  in  Figure 
1.  Such  a  behavior  of  these  three  indices  hints  that  a  formal  relationship  might 
exist  between  them. 

In  this  article  we  will  discuss  the  relationship  between  the  W'/W  index,  the 
hyper-Wiener  index  and  the  Wiener  number  for  acyclic  structures. 
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2.  DEFINITIONS 


Wiener  Number.  The  Wiener  number,  W=W(G),  of  a  connected 
undirected  graph  G  with  N  vertices  is  defined  as^® 

N-1  N 

W=2^(D)i|  (1) 

i  =  1  j>i 

where  (D)j|  denotes  the  element  in  the  i-th  row  and  j-th  column  of  the  distance 
matrix  D  of  the  graph  G.  The  summation  goes  over  all  the  entries  above  the 
main  diagonal  of  D.  If  G  is  a  connected  undirected  acyclic  graph  (tree),  T,  with 
N  vertices  the  Wiener  number  can  also  be  expressed  as^® 

W  =  X  '=3,  (2) 

eij 

where 


(3) 


ey  denotes  the  edge  connecting  vertices  i  and  j  of  T.  The  summation  in  eq  2 
goes  over  all  edges  in  T.  'Xg.,  and  *Xg..  in  eq  3  denote  the  number  of  vertices  of 

T  on  the  side  of  the  vertex  i  and  on  the  side  of  the  vertex  j  of  the  edge  ey , 
respectively. 


Graphical  Bond  Order  Wg../W.  The  graphical  bond  order  Wg../W  of  an 
edge  ey  of  a  connected  undirected  graph  G  is  defined  by^^'^® 


We,/W 


W{G-ey) 

W(G) 


(4) 
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where  W(G)  is  the  Wiener  number  of  G  and  W(G-ejj)  denotes  the  Wiener 
number  of  the  spanning  subgraph  G-ey  obtained  from  G  by  deleting  the  edge 
®ij-  ^'^ij connected  if  and  only  if  G  contains  at  least  one  ring  and  the  edge  ey 
is  one  of  the  edges  making  up  ring(s).  A  disconnected  G-e^  has  two 
components,  say  G^  and  G2  ,  and  the  Wiener  index  in  this  case  is  by  definition 
given  by  the  expression^ 

W(G-e)  =  W(Gi)+  W(G2)  (5) 


W'/W  Index.  The  W'/W  index  of  a  connected  undirected  graph  G  is  defined 


as; 


.17,18 


W 


VW=X 


(6) 


where  the  summation  goes  over  all  edges  of  G. 


Hyper-Wiener  Index.  The  hyper-Wiener  index  R  was  recently  introduced 
by  Randic  for  an  acyclic  structure.^^  The  R  index,  R=R(T),  of  a  tree  T  is  defined 
as 


(7) 


where  p^g  represents  the  path  connecting  vertices  r  and  s  of  T.  and 

Prs  Prs 

denote  the  number  of  vertices  of  T  on  each  side  of  the  path  p^.^  ,  including  r  and 
s,  respectively.  The  summation  runs  over  all  paths  in  T.  Note,  if  paths  of  length 
one  (edges)  are  the  only  paths  taken  into  consideration,  than  eq  7  is  reduced  to 
eq  2.  The  original  definition  was  extended  so  as  to  be  applicable  for  all 
connected  graphs. The  hyper-Wiener  index,  R=R(G),  of  a  connected  graph  G 
with  N  vertices  is  defined  as 
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N-1  N 


i=1  j>i 


3.  RELATIONSHIP  BETWEEN  W/W  INDEX,  HYPER-WIENER  INDEX 

AND  WIENER  NUMBER 


Let  T  be  a  connected  undirected  acyclic  graph  with  N  vertices  and  let  ejj  be 
an  edge  of  T,  see  Figure  2.  The  graph  T-ey  obtained  from  T  by  deleting  the 
edge  ey  has  two  components.  T^  and  T2  with  'Xg,.  and  ^Xg..  vertices, 

respectively.  Clearly,  'Xg.,-h  ^Xg..  is  equal  to  N.  The  Wiener  number  of  the 
spanning  subgraph  T-ey  of  T  is  smaller  than  the  Wiener  number  of  T  due  to  the 
absence  of  the  contribution  of  the  edge  ey  being  equal  to  'Xg,.  ^Xg..  ,  and 

decrease  in  contributions  of  the  remaining  edges.  An  edge  of  T-ey ,  say  egt, 
(see  Figure  2.)  makes  a  contribution  to  W(T-ey)  equal  to  the  difference  between 
the  contribution  of  the  corresponding  edge  in  T  to  W(T), 

product  ^Xg^^^Xg...  Hence,  the  difference  between  W(T)  and  W(T-ey)  is  given  by 


W(T)-W(T-^j)=  ^  = 

®abG  El 


'Xa, 


©cdS  E2 


on  the  condition  that 


(D)ia>  (D)ib 


(D)jd  >  (D)jc 


( °Xg  )  is  the  number  of  vertices  of  T.  (T,)  on  the  side  of  the  vertex  a  (d) 
'^ab  ^cd  '  ^ 

of  the  edge  e^b  (©cd)-  ^2  ®®^®  edges  of  the  components  T., 

and  T2,  respectively.  The  summation  in  the  second  (third)  term  on  the  right 
hand  side  of  eq  9  runs  over  all  edges  of  the  component  T.,  (T2)  of  the  graph 
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T-eij. 


The  graphical  bond  order  Wg,,/W  of  the  edge  ey  of  T  can  be  obtained  by 


dividing  eq  9  by  W(T): 


Wg  /W  =  1  - 


1  \ 


W(T) 


©abeE 


1]  ©cd/  (12) 

©cdG  E2 


This  quantity  represents  the  "importance"  of  the  edge  e^  in  T. 

The  W'/W  index  of  T  is  the  sum  of  graphical  bond  orders  of  all  N-1  edges  of 
T: 


W/W  =  N  -1 


W(T) 


©ij 


Xg..  ^Xp  +  ^  ^  + 

®||  ©IJ  ^ejj  ^  ^©ab^ 


©abS  E-j 


'X. 


Z  \J) 


(13) 


©cd  S  E2 


By  noting  that 


Z('Xa,|  \  X  'Xe,i  X  “Xe,,  )  =  2  R(T)  -  W(T)  (1 4) 

1  ©ab€  E-|  ©cdeE2 


the  relationship  between  the  W'/W  index,  the  hyper-Wiener  index  and  the 
Wiener  number  of  T  immediately  follows: 


w'/W  =  N  - 


2R(T) 

W(T) 


(15) 


Since  2R  is  equal  to  the  sum  of  the  (unnormalized)  second  moment  of  distance 
□2  and  the  Wiener  number, the  W'/W  index  can  also  be  expressed  as; 


W/W  =  N  - 1  - 


D2(T) 

W(T) 


(16) 


4.  DERIVATION  OF  GENERAL  EXPLICIT  FORMULA  FOR 
HYPER-WIENER  INDEX  OF  TREE 
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A  closer  examination  of  eqs  6  and  15  reveals  that  it  is  possible  by  means  of 
them  to  arrive  at  the  general  explicit  formula  for  calculation  of  the  hyper-Wiener 
index  of  T.  To  wit,  combining  eqs  6  and  15  one  can  write 


R(T)  = 


N  W  -  W' 
2 


(17) 


where  W  is  the  sum  of  the  Wiener  numbers  of  all  the  spanning  subgraphs  T-e 
of  T.  It  is  well-known  that  the  Wiener  number  can  be  calculated  in  a  number  of 
ways."'®’''^’^^'^'^  If  one  selects  the  route^^ 


n^N-1 

W  =  ^  n  "p 
n=1 


(18) 


where  ^p  denotes  the  number  of  paths  of  length  n  in  T,  then  the  general  explicit 
formula  for  calculation  of  R(T)  can  be  derived  in  a  rather  simple  way. 

The  total  number  of  paths  of  a  given  length  n  in  the  spanning  subgraphs  T- 
e  of  T,  "^P,  is  given  by  the  expression: 

"p  =  ^  "Pt^=  (N-n-1)  "p  (19) 

T-e 

where  ^p^  is  the  number  of  paths  of  length  n  in  T-e,  and  the  summation  runs 

over  all  the  spanning  subgraphs  T-e  of  T. 

W'  of  T  is  given  by 

n$N-1  n<N-1 

=  ^  n  "P  =  ^  n  (N  -  n  -  1)  "p 
n=1  n=1 


W 


(20) 
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Combining  eqs  17,  18  and  20  one  obtains: 
n^N-1 

R(T)  =  1  ^  (n^  +  n)  "p 

n=1 


(21) 


To  calculate  the  R  index  of  T  it  is  necessary  and  sufficient  to  have  knowledge  of 
's  of  T.  The  number  of  paths  of  a  given  length  n  in  T  can  be  calculated  either 
by  means  of  the  recursive  relation  or  using  the  general  analytical  expression  for 
“^p  in  T.^^  The  recursive  relation  reads  as: 

"p=  X ;  nas  (22) 

(D)i,=n-2 


Where  Vj  is  the  valence  of  the  vertex  i,  and  {D)jj=n-2  denotes  that  the  topological 
distance  between  the  vertices  i  and  j  is  equal  to  n-2.  The  summation  runs  over 
ail  the  pairs  of  vertices  of  T  separated  by  the  paths  of  length  n-2.  The  initial 
conditions  are 

V  =  N-1  (23) 


2  1 


Vi  -N  +  1 


(24) 


The  general  analytical  expression  for  "^p  in  T  reads  as  follows: 
n-2+k 

"p  =  2  (-1)"'^®(n  -  m  - 1  + 1)  ^  V|  Vj  + 
nn=1  (D)ij=m 

i<] 


Note  that  Z  Vj  vj  =  X  vf.  The  parameters  k,  s  and  t  take  the  following  values: 

(D)y=0  i  ^ 
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'  2  for  n  =  1 

k  =  -  1  for  n  =  2  (26) 

0  for  n  >  3 

V, 

’  1  for  n  =  odd 

s=  (27) 

_0  for  n  =  even 

and 

’  1  for  n  =  1 

t=  ■  (28) 

0  for  n  >  2 

Combining  eqs  21  and  25  one  obtains  the  general  explicit  formula  for 
calculation  of  the  R  index  of  T 


where  the  parameters  k,  s  and  t  take  the  values  from  eqs  26,  27  and  28, 
respectively.  The  application  of  the  formula  is  illustrated  for  the  hydrogen 
suppressed  graph  of  2,2,3,4-tetramethylpentane  in  Figure  3. 

A  special  case  of  an  acyclic  graph  is  the  path  graph,  P^.  Eq  29  in  case  of 
takes  a  rather  simple  form 

n=N-1 

R(Pn)  =-j]^(n^  +  n)(N-n)  (30) 

n=1 

whose  summation  gives  the  formula  already  derived  by  Lukovits.^® 


-10- 


ACKNOWLEDGMENT 


This  work  was  supported  by  the  Ministry  of  Science  and  Technology  of  the 
Republic  of  Croatia  and  by  the  Croatian-Slovenian  project  "Discrete 
Mathematics  in  Chemistry".  We  are  grateful  to  Dr.  Milan  Soskic  (Zagreb)  and 
Professor  Ante  Graovac  (Zagreb)  for  useful  comments. 


REFERENCES  AND  NOTES 


(1 )  McWeeny,  R.  Methods  of  Molecular  Quantum  Mechanics;  2nd  ed.; 
Academic  Press:  London,  1989. 

(2)  Rouvray,  D.  H.  Predicting  Chemistry  from  Topology.  Sci.  Am.  1986, 
254(3),  36-43. 

(3)  Turro,  N.  J.  Geometrical  and  Topological  Thinking  in  Organic 
Chemistry.  Angew.  Chem.  Int.  Ed.  Engl.  1986,  25,  882-901. 

(4)  Rouvray,  D.  H.  The  Modeling  of  Chemical  Phenomena  Using 
Topological  Indices.  J.  Comput.  Chem.  1987,  8,  470-480. 

(5)  Randic,  M.  The  Nature  of  Chemical  Structure.  J.  Math.  Chem.  1990, 
4.  157-189. 

(6)  Randic,  M.  Chemical  Structure  -  What  is  She?  J.  Chem.  Educ.  1992, 
69,  713-718. 

(7)  Rouvray,  D.  H.  A  Rationale  for  the  Topological  Approach  to 
Chemistry.  J.  Mol.  Struct.  (Theochem)  1995,336,  101-114. 

(8)  Randic,  M.  On  Characterization  of  Chemical  Structure.  J.  Chem.  Inf. 
Comput.  Sci.  1997,  37,  672-687. 

(9)  Milne,  G.  W.  Mathematics  as  a  Basis  for  Chemistry.  J.  Chem.  Inf. 
Comput.  Sci.  1997,  37,  639-644. 

(10)  Harary,  F.  Graph  Theory;  Addison-Wesley:  Reading,  Massachusetts, 


1972. 


-11- 


(11)  Kier,  L.  B.;  Hall,  L  M.  Molecular  Connectivity  in  Structure-Activity 
Analysis;  Research  Studies  Press  Ltd.:  Letchworth,  England,  1986. 

(12)  Gutman,  I.;  Polansky,  0.  Mathematical  Concepts  In  Organic 
C/rem/sfry;  Springer-Verlag:  Berlin,  1986. 

(13)  Bonchev,  D.;  Rouvray,  D.  H.  (Eds.)  Chemical  Graph  Theory. 
Introduction  and  Fundamentals;  Abacus  Press,  Gordon  and  Breach: 
New  York,  1991. 

(14)  Trinajstic,  N.  Chemical  Graph  Theory;  2nd  revised  ed.;  CRC  Press: 
Boca  Raton,  FL,  1992. 

(15)  Balaban,  A.  T.  (Ed.)  From  Chemical  Topology  to  Three-Dimensional 
Geometry:  Plenum  Press:  New  York,  1997. 

(16)  Hosoya,  H.  Topological  Index.  A  Newly  Proposed  Quantity 
Characterizing  the  Topological  Nature  of  Structural  Isomers  of 
Saturated  Hydrocarbons.  Bull.  Chem.  Soc.  Jpn.  1971, 44 , 2332- 
2339. 

(17)  Randic,  M.  Search  for  Optimal  Molecular  Descriptors.  Croat  Chem. 
Acta  1991,64,43-54. 

(18)  Randic,  M.;  Mihalic,  Z.;  Nikolic,  S.;  Trinajstic,  N.  Graphical  Bond 
Orders:  Novel  Structural  Descriptors.  J.  Chem.  Inf.  Comput  Sci. 

1994,  34 , 403-409. 

(19)  Wiener,  H.  Structural  Determination  of  Paraffin  Boiling  Points.  J.  Am. 
Chem.  Soc.  1947,  69 , 17-20. 

(20)  Randic,  M.  Novel  Molecular  Descriptor  for  Structure-Property 
Studies.  Chem.  Phys.  Lett.  1993,  211, 478-483. 

(21)  Randic,  M.  Comparative  Regression  Analysis.  Regressions  Based  on 
a  Single  Descriptor.  Croat  Chem.  Acta  1993,  66,  289-312. 


-12- 


(22)  Klein,  D.  J.;  Lukovits,  I.;  Gutman,  I.  On  the  Definition  of  the  Hyper- 
Wiener  Index  for  Cycle-Containing  Structures.  J.  Chem.  Inf.  Comput. 
Sc/.  1995,  35,672-687. 

(23)  Bonchev,  D.;  Trinajstic,  N.  Information  Theory,  Distance  Matrix  and 
Molecular  Branching.  J.  Chem.  Phys.  1977,  67,  4517-4533. 

(24)  Bersohn,  M.  A  Fast  Algorithm  for  Calculation  of  the  Distance  Matrix. 
J.  Comput.  Chem.  1 983,  4, 11 0-1 1 3. 

(25)  Canfield,  E.  R.;  Robinson,  R.  W.;  Rouvray,  D.H.  Determination  of  the 
Wiener  Molecular  Branching  Index  for  the  General  Tree.  J.  Comput. 
Chem.  1985,6,598-609. 

(26)  Barysz,  M.;  Plavsic,  D.;  Trinajstic,  N.  A  Note  on  Topological  Indices. 
Math.  Chem.  (Mulcheim  /  Ruhr)  1986,  19,  89-116. 

(27)  Muller,  W.  R.;  Szymanski,  K.;  Knop,  J.  V.;  Trinajstic,  N.  An  Algorithm 
for  Construction  of  the  Molecular  Distance  Matrix.  J.  Comput.  Chem. 
1987,  8,  170-173. 

(28)  Mohar,  B.;  Pisanski,  T.  How  to  Compute  the  Wiener  Index  of  a 
Graph.  J.  Math.  Chem.  1988,  2,  267-277. 

(29)  Senn,  P.  The  Computation  of  the  Distance  Matrix  and  the  Wiener 
Index  for  Graphs  of  Arbitrary  Complexity  with  Weighted  Vertices  and 
Edges.  Comput.  Chem.  1988,  12,  219-227. 

(30)  Lukovits,  I.  General  Formulas  for  the  Wiener  Index.  J.  Chem.  Inf 
Compuf.  Sc/.  1991,31,503-507. 

(31)  Gutman,  1.;  Yeh,  Y.  N.;  Lee,  S.  L.;  Luo,  Y.  L.  Some  Recent  Results  in 
the  Theory  of  the  Wiener  Number.  Indian  J.  Chem.  1993,  32A,  651- 
661.  .  . 

(32)  Plavsic,  D.;  Nikolic,  S.;  Trinajstic,  N.;  Klein,  D.  J.  Relation  between :  L 
the  Wiener  Index  and  the  Schultz  Index  for  Several  Classes  of 
Chemical  Graphs.  Croat.  Chem.  Acta  t993,  66,  345-353. 


-13- 


(33)  Gutman,  I.  A  New  Method  for  the  Calculation  of  the  Wiener  Number 
of  Acyclic  Molecules.  J.  Mol.  Struct.  (Theochem)  1993,  285,  137-142. 

(34)  Nikolic,  S.;  Trinajstic,  N.;  Mihalic,  Z.  The  Wiener  Index:  Development 
and  Applications.  Croat.  Chem.  Acta  1995,  68,  105-129. 

(35)  Plavsic,  D.;  Soskic,  M.;  Landeka,  1.;  Trinajstic,  N.  On  the  Relation 
between  the  P7P  Index  and  the  Wiener  Number.  J.  Chem.  Inf. 
Comput  Sci.  1 996,  36,11 23-1 1 26. 

(36)  Lukovits,  I.  Formulas  for  the  Hyper-Wiener  Index  of  Trees.  J.  Chem. 
Inf  Comput.  Sci.  1994,  34 , 1079-1081. 


-14- 


FIGURE  CAPTIONS 


Figure  1.  Plot  of  W/W  vs  R- 10'^  for  nonanes.  The  regression  equation  and 
statistical  parameters  are  W'/W  =  -1 .055(±0.01 1  )(R- 1 0'^)  + 
6.899(±0.023):  n  =  35;  r^  =  0.997;  s  =  0.026;  F^'^^  =  9507. 

Figure  2.  a)  A  connected  undirected  acyclic  graph  T.  An  edge  of  T  is 
denoted  by  ey. 

b)  The  spanning  subgraph  T-ejj  of  T  with  components  T.,  and  T2. 

Figure  3.  Calculation  of  the  R  index  of  the  hydrogen  suppressed  graph  of 
2,2,3,4-tetramethyipentane.  Numbers  at  each  site  represent  the 
corresponding  graph-theoretical  valences. 
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Abstract 

Due  to  the  one- dimensional  characteristic,  the  excitation  of  an  electron  in  poly¬ 
mers  is  self-trapped,  and  two  excitons  can  be  combined  to  form  a  biexciton. 

This  combination  process  is  an  important  channel  to  forming  the  biexciton  and 
is  accompanied  by  lattice  distortion.  By  solving  the  dynamic  equations,  the 
relaxation  process  of  this  combination  is  investigated,  where  it  is  found  that  the 
relaxation  time  is  about  160  fs. 

'  This  work  was  supported  by  the  National  Science  Foundation  of  China  (Grant  Nos.  59790050  & 
19874014).  Project  863-715-010,  and  the  Shanghai  Center  for  Applied  Physics. 


1.  INTRODUCTION 


Conjugated  polymers  with  nondegenerate  ground  states,  such  as  polyparaphenyleneviny- 
lene  (PPV)  and  polyphenylquinoxaline  (PPQ),  are  found  to  have  excellent  nonlinear  optical 
properties  and  can  be  used  as  the  active  luminescent  layer  in  new  polymer-based  electrolumines¬ 
cence  LED  devices  [1].  The  excited  states,  especially  excitons  and  biexcitons  play  central  roles 
in  these  photophysics  processes  [2].  Recently  it  was  proposed  that  the  biexciton  state  possesses 
a  novel  property  —  negative  polarizability,  where  its  induced  dipole  moment  is  in  the  opposite 
direction  to  the  external  electrical  field  [3].  This  is  a  straightforward  way  for  the  exciton  to 
absorb  a  photon  and  become  a  biexciton,  but  the  efficiency  of  this  two-photon  process  is  low. 
There  is  another  channel  to  form  the  biexciton.  The  polymer  chain  is  a  qua.si-one-dimensional 
system,  and  due  to  strong  coupling  between  electron  and  lattice  motion,  the  electron  excitation 
must  be  accompanied  with  bond  structure  distortion.  This  self-trapping  effect  increeises  the 
binding  energy  of  the  excited  state  [4].  As  a  result,  the  energy  of  a  self-trapped  biexciton  is 
lower  than  that  of  two  separate  excitons.  Thus,  two  self-trapped  excitons  in  one  chain  can 
evolve  to  a  single  biexciton.  Under  photoexcitation,  many  excitons  are  generated,  and  in  the 
case  of  electroluminescence,  the  injection  of  charge  carriers  also  produces  many  excitons.  When 
two  moving  excitons  encounter  each  other,  they  combine  into  a  biexciton,  which  is  an  important 
channel  for  biexciton  formation  [5]. 

In  this  paper,  the  evolution  of  the  combination  is  simulated  by  solving  the  dynamic  equa¬ 
tions.  From  the  evolution  of  the  bond  structure  and  electronic  spectrum,  it  is  found  that  the 
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relaxation  time  of  the  combination  is  about  160  fs. 


II.  MODEL  AND  METHOD 


As  usual,  the  SSH  Hamiltonian  [4]  (modified  by  Brazovskii  and  Kirova  [6])  is  used  to  model 
the  polymer  in  its  nondegenerate  ground  state,  consisting  of  N  lattice  sites  and  N  electrons: 


Ho  =  Y2[^o-  a  -  u„)  +  (-1)"  Q  {at+i,3<^n,s  +  h.c.) 


K 


(2.1) 


Here  is  the  displacement  of  the  nth  atom,  is  the  annihilation  operator  of  an  electron  at 
the  nth  atom,  the  nondegenerate  ground  state  is  depicted  by  tg,  a-iid  the  final  term  is  added  to 
avoid  the  collapse  of  the  open  chain  with  a  finite  length  [7,8].  When  an  external  electrical  field 
E  is  applied,  the  potential  energy  of  electrons  is 


Hi  =  eEj2 

n.s 


N  +  l 
2 


a  +  u 


^n,s^n,s 


(2.2) 


We  define  the  order  parameter  as  (f>n  =  (— l)"un  and  divide  the  total  Hamiltonian  into  an 
electron  part  He  and  a  lattice  part. 


n 


For  a  certain  lattice  configuration  {^n}  ,  we  can  diagonalize  He{{(l>n})  to  obtain  the  elec¬ 
tronic  energy  spectrum  {e,}  and  corresponding  eigenstate  \i).  Then  the  total  energy  is 


(2.4) 


OCCU  jy 

^  +  "y  X/  ~  ^  {<^1  +  (  — • 

:  n 

Because  the  mass  of  an  atom  is  much  larger  than  that  of  an  electron,  we  have  been  able  to  use 
the  adiabatic  approximation  [9].  The  nth  atom  experiences  a  force 

(2.5) 

By  using  the  Heilman- Feynman  theorem,  we  get 
ds 

^  =  S  (-1)"  (^1*)  +  1|^)  (1  -  Kn)  -2a{n-  l|i)  (1  -  6^,1)  -f  {n\i)  eE) .  (2.6) 

t 

Taking  a  short  interval  r  as  the  time  step,  the  dynamic  equations  (2.5  )  can  be  solved 
numerically.  Step  by  step,  we  can  simulate  the  dynamic  evolution  of  the  bond  structure.  The 
period  tq  of  lattice  vibration  is  about  4  x  10"^^  s,  and  since  the  time  step  r  must  satisfy  r  <  tq, 
we  choose  t  =  1  fs.  The  damping  term  is  -Xmd(f>/dt,  and  A  should  be  <  I/tq  [10,11].  The 
results  show  that  changes  in  A  do  not  influence  the  main  character  of  the  relaxation  process. 
In  our  calculation,  we  have  chosen  the  parameters  according  to  the  cis-polyacetylene, 

to  =  2.5  eV,  tg  =  —0.05  eV]  a  =  41  eV/nm,  K'  =  1.25  a 
A"  =  2.1  X  10^  eV/nm^,  a  =  0.122  nm,  m  =  13  u,  =  100  . 
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III.  RESULTS 


To  simulate  the  combination  of  two  excitons  into  one  biexciton,  we  take  two  excitons  which 
are  close  to  each  other  as  the  initial  lattice  configuration  {t  =  0  in  Fig.  2).  For  an  exciton, 
an  electron  is  excited  to  the  bottom  of  the  conduction  band  (LUMO).  For  an  biexciton,  two 
electrons  with  opposite  spins  are  excited  to  the  LUMO  with  two  holes  left  in  the  HOMO.  Figure 
1  shows  the  evolution  of  the  50th  energy  level  E5Q  under  different  electrical  fields.  At  t  =  0, 
E50  is  the  energy  level  of  the  lower  gap  state  of  the  self-trapped  exciton.  As  time  passes, 
it  evolves  into  that  of  the  self-trapped  biexciton.  Figure  2  shows  the  lattice  configuration  at 
different  times  for  £■  =  0.1  Mvjcm.  As  can  be  seen  from  these  figures,  when  t  >  100  fs  , 
the  change  of  lattice  configuration  becomes  smaller  and  smaller,  and  finally  converges. 

From  Figs.  1  and  2  it  is  seen  that  the  relaxation  time  is  about  160  fs,  which  is  about  the 
same  that  of  the  photoexcitation  process.  As  a  result  of  exciton  combination,  the  energy  of 
the  system  decreases,  and  the  binding  energy  of  the  biexciton  is  0.77  eV. 
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Figure  Captions 


Fig.  1.  Evolution  of  the  energy  level  Eso  under  different  electrical  fields:  E  =  0  and 
E  =  0.1  Mvjcm. 

Fig.  2.  Evolution  of  the  lattice  configuration  under  E  =  0.1  Mvlcm  :  (a)  t  from  0  to  60 
fs]  (b)  t  from  80  to  100  fs;  (c)  t  from  120  to  180  fs. 


7 


0=3 


Fig.  2  (a) 


00 

CD 

CSJ 

O 

CN 

CD 

00 

o 

o 

CD 

O 

O 

O 

o 

O 

O 

o 

CD 

CD 

CD 

CD 

CD 

C) 

CD 

CD 

(ururo)V 


PRECIPITATION  AT  EQUIVALENCE  AND  EQUILIBRIUM  -  A  METHOD 
FOR  THE  DETERMINATION  OF  EQUILIBRIUM  CONSTANTS  OF 
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A  theoretical  approach  for  the  determination  of  the  equilibrium  constant,  Ka,  of  the 
reaction  between  a  multideterminant  antigen  (Ag)  and  specific  polyclonal  antibodies 
(Ab)  forming  the  insoluble  Ab/Ag  immune  complex,  is  derived.  The  constant  can  be 
e>q)ressed  as  a  function  of  the  two  accessible  experimental  parameters,  the  precipitating 
concentration  of  the  antigen  and  the  Ab/Ag  molar  ratio.  For  this  purpose  Ab/Ag 
immune  complex  must  be  prepared  at  equivalence  and  equilibrium  between 
precipitated  and  soluble  species  must  be  reached.  The  proposed  method  is 
experimentally  tested  on  the  system  human  serum  albumin  (HSA)  and  polyclonal  rabbit 
antibodies.  The  Ab/Ag  precipitates  are  prepared  by  the  direct  mixing  of  biological 
fluids  in  which  immunoreacting  components  naturally  occur.  Previous  separation, 
purification  or  labelling  of  immunoreacting  components  are  not  required.  The 
conditions  for  the  precipitation  of  Ab/Ag  complexes  at  equivalence,  the  stoichiometric 
composition  or  the  average  number  of  Ab  molecules  bound  to  one  Ag  molecule,  and 
the  solubility  of  the  knmunoprecipitating  components,  are  determined  by  a  rectangular 
two-dimensional  double  immunodiffusion.  Since  the  solubility  determined  under  the 
conditions  of  a  double  immunodiffusion  is  a  result  of  the  interaction  of  the  global 
diffusion  of  the  precipitating  components  and  particle  grov.th  kinetics,  it  mostly  refers 
to  the  d\'namic  conditions.  In  order  to  find  the  solubOity  under  equilibrium  conditions, 
it  is  sufiBcient  to  determine  the  minimal  factor  by  which  the  solutions  of  both 
immunoprecipitating  components  should  be  diluted  so  that  no  precipitate  is  formed 
upon  their  mixing  at  equivalence.  The  dilution  factor  is  determined  by  a  measurement 
of  the  laser  light  scanering  of  the  inimunocrecipitating  s>’stems  prepared  with  serially 
diluted  Ag  and  Ab  solutions. 


INTRODUCTION 


specificit\-  and  concentration  of  an  antibody  (Ab)  determine  its  usefubess 
for  analvtical.  diagnostic  and  therapeutic  purposes  and  are  also  implicated  as  important 
factors  b  the  immune  response.  .Antibody  affinity  is  defined  as  the  attractive  force 
between  an  antigenic  determbant  (epitope)  and  the  antibody  combbbg  site 
(j)aratope).  .Accordbgly  to  its  definition,  the  affinity'  can  be  measured  only  wfien  the 
antigen  (Ag)  is  a  simple,  well-defined  substance  such  as  a  hapten.  The  determbation  of 
the  affinity  of  antibodies  directed  agabst  a  proteb  b  this  sense  is  impossible,  because 
of  the  multiplicity  and  heterogeneity  of  antigenic  determbants.  To  describe  quantitative 
differences  of  the  bteractions  between  a  multidetermbant  Ag  and  polyclonal  Abs, 
despite  a  lack  of  a  precise  thermodynamic  and  immunochemical  meanbg,  the  term 
abdity  was  btroduced.  Avidity  is  the  goodness  of  fit  between  more  than  one  epitope 
of  the  antigen  and  more  than  one  site  of  the  antibody.  The  rates-.of  dissociation,  the 
solubilities  of  Ab/Ag  complexes  precipitated  at  the  optimum  proportions,  the 
deviations  from  the  Ibearity  of  the  curve  representbg  reciprocal  concentrations  of  a 
bound  antigen  vs.  reciprocal  concentrations  of  a  free  antigen,  antigen  bbdmg 
capacities,  immunochemical  titers,  bdices  of  avidities,  as  well  as  the  data  obtabed  by 
measurements  of  50%  bbdmg  of  antibodies  by  an  antigen  are  related  to  the  avidity.*'^ 
However,  the  avidities  are  often  expressed  through  the  association  constants,  Ka, 
which  represent  the  equilibrium  of  association,  ka,  and  dissociation,  kd,  rates  of  Ab/Ag 
complexes.  The  difficulties  b  reachbg  equilibrium  due  to  heterogeneous  bbdbg  are 
responsible  that  the  application  of  the  solid  phase  affinity  methods  for  the 
determbation  of  Ka,  should  be  taken  with  precautions.  Despite  of  the  simplicity  of 
these  methods,  such  as  an  ELISA,  various  surface  effects  can  cause  errors  b  estimates 
of  either  liquid  or  solid  phase  affinities  and  influence  on  the  rankbg  of  affinities. 

In  this  paper  we  propose  a  new  approach  to  the  determbation  of  the  equilibrium 
constants  of  the  reactions  between  multidetermbant  .Ags  and  specific  polyclonal  Abs. 
The  method  is  based  solely  on  the  determbations  of  the  concentration  of  soluble  Ag  m 
equilibrium  with  bsoluble  Ab/Ag  immune  complex  prepared  at  equivalence. 


THEORETIC.AL  CONSIDERATIONS 

Equilibrium  Constants  of  Polyclonal  .Antibodies  Raised  .Against  a 
Multideterminant  .Antigen.  .An  immune  complex  of  the  average  composition  .ABn  is 
formed  when  one  molecule  of  multidetermbant  antigen  A  reacts  with  n  molecules  of 
the  specific  polyclonal  antibodies  B: 

.A  -  n  B  •<->  .ABn 

The  average  composition  means  that  n  should  not  necessaiy  be  an  bteger  number. 
In  case  that  n  is  greater  than  unity-,  a  portion  of  the  immune  complex  .ABn  may  be 
precipitated.  In  this  case  the  equilibrium  is  reached  when  fully  reversible  reaction  is 
established  between  the  precipitated  complex  .ABn.  (solid  phase)  and  the  species 
remabbg  in  the  solution  (solute  phase): 
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ABnp  ABns  +  As  +  n  Bs 

The  solute  phase  consists  of  the  free  components  As  and  Bs.  and  components  As.b 
and  Bs,h  bound  in  the  soluble  ABu;  complex.  The  reversible  reaction  bemeen  soluble 
immune  complex  and  free  species  in  solute  phase,  also  exists  under  ecjuilibriurn 
conditions : 

ABns  As  +  n  Bs  (^) 

In  the  immunoprecipitating  system  prepared  under  equivalence  conditions,  the  ratio 
of  molar  concentrations  of  B  and  A  components,  n.  is: 

n  =  cB/cA 

and  is  the  same  in  the  precipitated  and  soluble  immune  complexes  and  corresponds  to 
the  ratio  of  free  B  and  A  species  in  the  solute  phase. 

According  to  eq  1,  the  equilibrium  constant  of  the  primary  Ab/Ag  reaction  in  a 
system  prepared  at  equivalence  and  equilibrium  is: 

Ka'  =  cAs.b/(cAs  cBs")  0) 

By  substitution  of  the  cB  by  cA  from  eq  2,  the  equilibrium  constant  equation  becomes. 

Ka'  =  cAsVCn"  cAs"^l)  (4) 

Ka'  reflects  the  avidity  of  the  antibodies,  irrespective  of  the  number  of  antibody 
valences,  i.e,  binding  sites,  involved  in  the  reaction.  In  case  that  the  multideterminant 
antigen  bears  distinct  antigenic  determinants,  each  of  them  is  able  to  react  only  with 
one  bindins  site  on  the  molecule  of  the  specific  antibody.  Since  the  antibody  molecules 
possess  more  than  one  binding  site,  the  valency  of  antibodies  should  be  taken  into 
account  in  order  to  calculate  the  equilibrium  constant.  For  instance,  the  equilibrium 
constant  of  the  reaction  involving  bivalent  antibodies  belonging  to  IgG  classes,  reads 
as: 

In  Ka  =  In  Ka'/2  n  (5) 

Equivalence  and  Equilibrium  Conditions.  In  case  that  the  concentrations  of  the 
antigen  and  antibody  solutions  are  unknown,  the  equivalence  conditions  can  be 
detennined  by  a  rectangular  two-dimensional  double  difiusion  technique  called  the 
"two-cross"  immunodiffusion.  A  detailed  description  of  the  two-cross  experimental  set¬ 
up  which  enables  an  adequate  solution  of  the  Pick's  second  law  of  diffusion  applied  to 
the  immunoprecipitation  in  gels,  is  described  elsewhere.^  '  Briefly,  a  "cross"  consists  of 
four  troughs  cut  at  a  right  angle  in  a  gel  plate.  The  half-width  of  the  trough  is  denoted 
by  h.  The  troughs  of  each  cross  are  filled  in  alternate  order  with  antigen  solution 
(component  A)  and  immune  serum  (component  B).  In  the  second  cross,  the  solutions 
of  both  precipitating  components  are  diluted  by  the  same  factor,  d.  The  distances 
between  peaks  of  the  precipitin  lines  in  the  direction  of  the  diffusion  of  antigen,  x,  and 
antibodies,  y.  are  measured  in  both  crosses  at  a  same  time.  t.  The  volume  ratio  of  the 
solutions  of  the  precipitating  components  required  to  ensure  equivalence  conditions 
during  the  preparation  and  precipitation  of  immune  comple.xes  ,  is  given  by: 
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\B/vA  =  ad*’  (6) 

where 

a=[(xr-xr)/(yf-y,‘)]'^ 
b  =  (x27i^  -  X,  V)/[(xi^  -  X2^)  (y,-  -  y2^)] 

The  subscript  1  and  2  refer  to  the  parameters  measured  in  the  first  and  the  second 
cross,  respectively. 

So  far,  the  two-cross  immunodiffusion  technique  enables  the  direct  determination  of 
the  reciprocal  precipitating  titers  (eqs  7  and  8)  and  the  diffusion  coefScients  (eqs  9  and 
1 0)  of  the  reacting  molecules.  Precipitatmg  titers  PT  are  defined  as  the  ratio  of  the 
equivalent  molar  concentrations  of  the  substance  at  the  origin  of  diffusion,  Co,  and  at 
the  point  of  the  onset  of  precipitation,  Cpri 

l/PTA  =  (W/x,)Xe-^  =  cVcAo  (7) 

1/PTB  =  (W/y,)  Y  e^'  =  cBpr/cBo  (8) 

where: 

W  =  h(2/Tty^ 

X  =  xi  [2  In  d/(xi^  -  X2^)] 

Y  =  y,  [21nd/(y,2-y2')]*^ 

The  precipitatmg  concentrations,  Cpr,  can  be  calculated  fi-om  known  both 
precipitatmg  titers  and  initial  concentrations  of  the  solution  of  the  precipitating 
components.  In  case  that  the  initial  concentrations  of  the  antibody  and  antigen 
solutions  are  unknown,  they  can  be  determined  directly  in  crude  biological  fluids  in  a 
manner  described  in  details  elsewhere.*’^ 

The  diffusion  coefficients  are  obtained  fi-om  the  relations: 

DA  =  1  /t  [(xr-X2")/(4  In  d)  -  h'/b]  (9) 

DB  =  1  /t  [(yi--y2^)/(4  In  d)  -  h=/6]  (10) 

From  the  data  obtained  for  the  diffusion  coefficients,  the  approximate  values  of  the 
molecular  masses  of  the  Ag  and  Ab  molecule  can  be  calculated  using  a  simple  relation: 

M  =  Migo  (D,gG/D)'  (11) 

MigG  =  150  000  Da  and  DigG  =  4.1  10'^  cm*/s  is  the  molecular  mass  and  the  diffusion 
coefficient  of  human  IgG,  respectively.  D  is  the  diffusion  coefficient  of  Ag  or  Ab 
molecule  referring  to  the  fi-ee  diffusion  in  distilled  water  at  20  °C. 


From  the  conservation  of  mass,  the  precipitating  concentrations  referring  to  the 
equilibrium  conditions  correspond  to  the  sum  of  the  concentrations  of  free  reactant  and 
the  concentration  of  reactant  bound  in  a  soluble  immune  complex: 

CApr  =  cA  s,b  +  cAs  (12) 

cBprr  =  cBs.b  +  cBs  (12) 

According  to  the  equivalence  rule,  the  precipitation  under  conditions  of  a  double 
diffusion  starts  at  the  equivalent  molar  concentrations  of  both  precipitating 
components, as  refers  eq  2: 

n  =  cBpr/cApr  (14) 

This  means  that  the  equilibrium  constant  equation  (eq  4)  can  be  solved  using  Cpr 
values  from  eq  7  or  8.  The  critical  precipitating  concentrations  in  eqs  7  and  8  represent 
the  solubility  of  an  immunoprecipitating  component  under  dynamic  conditions,  as  a 
result  of  the  interaction  of  the  global  difrusion  of  the  precipitating  components  and 
particle  growth  kinetics. Thus,  the  solubility,  Cpr,  has  a  kinetic  and  not  a 
thermodynamic  significance.  In  case  when  enough  time  is  allowed  for  the  precipitation 
in  the  solutions  to  reach  the  equilibrium,  the  solubility  referring  to  the  thermodynamic 
conditions,  could  be  found.  If  not,  it  is  sufficient  to  determine  the  minimal  factor,  mo, 
by  which  both  solutions  of  A  and  B  components  should  be  diluted  so  that  no 
precipitate  is  formed  upon  their  mixing  at  equivalence.  The  concentrations,  Cpr/mo, 
referring  to  the  equilibrium  conditions  are  denoted  as  Cpr*. 

In  order  to  solve  eq  4  using  cApr*  and/or  cBpr*,  the  concentrations  of  free 
components  and  components  bound  in  an  immune  complex  in  the  solute  phase  (eqs  12 
and  13)  should  be  known.  The  solution  is  possible  by  introducing  the  ratio,  r,  of  the 
concentrations  of  both  bound  and  free  antigen  in  the  solute  phase: 

r  =  cAs.b*/cAs* 

The  equilibrium  concentrations  cA*  and  cAs.b  *,  according  to  the  defimtion  for  cAs 
and  cAs.b  given  by  eq  12,  can  be  now  expressed  as  follows: 

cAs*  =  cApr*/(r  +1)  (i5) 

cAs.b*  =  cApr*  r/(r  +1)  (16) 

The  equilibrium  constant  equation  (eq  4)  m  terms  of  cApr*,  r  and  n,  reads  as: 

Ka  -  ( 1  /c Apr* ) "  r  [(r  +  1  )/r] "  (12) 

Equation  1 7  consists  of  tv^o  factors:  the  factor  (1/cApr*) "  and  a  factor  F  which  is  a 
function  of  r  and  n: 

F  =  r  [(r+ l)/n]"  (18) 

The  molar  concentrations  of  both  bound  and  free  antigen  in  the  solute  phase  are 
extremely  low  and  their  experimental  determination  as  well  as  the  determination  of 
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their  ratio,  r,  is  difficult.  By  var>'ing  the  values  of  n  in  eq  18,  the  F  function  reaches  a 
maximum  for  a  constant  r  value  (Figure  1).  The  maximum  of  F  function  corresponds 
to  the  maximal  stability  of  an  immunoprecipitating.  system  prepared  at  equivalence 
under  given  experimental  conditions  (pH.  temperature,  ionic  strength,  etc.)  and  now, 
the  Ka'  equation  reeds  as: 

Ka’=(l/cAp,*)''F„^,  (19) 

Fmax  is  solely  a  function  of  n.  In  order  to  find  Fnm.  the  derivative  of  F  should  be  zero: 
dF/dn  =  (r/n')  [(r+  l)"ln(r+  l)-(r  +  1)"  (In  n  ^  1)]  =  0 
this  means  that: 

In  (r  +  1)  =  In  n  +  1  =  in  (n  e) 
i.e. 

r  =  n  e  -  1  (20) 

Introducing  eq  20  into  eq  1 8 

F„^x  =  (ne-l)e''  (21) 

where  e  is  the  basis  of  the  natural  logarithms. 

The  solution  of  the  equilibrium  constant  eqs  4  and/or  1 7  in  terms  of  cApr*  and  n  is: 
Ka'=(e/cApr*)"(ne- 1)  (22) 


MATERIALS  AND  METHODS 

Immunogen  and  Antigen.  Immuno chemically  pure  human  serum  albumin  (HSA), 
pi  4.7,  M.w.  65  kDa  was  prepared  under  non-denaturating  conditions  and  contained 
97.6%  of  monomer  and  2.4%  of  dimer  (Calbiochem-Behring  Corp.,  La  Jolla,  Ca., 
USA).  The  same  HSA  was  used  for  the  immunization  of  rabbits  (immunogen)  and  as 
the  antigen  in  the  precipitation  experiments.  The  concentration  of  antigen  solutions 
amounted  to  cAO  =  0.500  g  HSA/1. 

Animals,  Immunization  and  Immune  Sera.  Three  randomly  chosen  male  New 
Zeland  rabbits  were  immunized  successively  in  two-week  intervals  with  the  HSA.  Each 
dose  of  the  immunogen  contained  0.75  mg  of  the  HSA  in  a  volume  of  1  ml  of 
complete  (CFA)  or  incomplete  Freund's  adjuvant  (IFA)  or  saline  solution.  For  the 
primary  immunization,  the  immunogen  was  emulsified  in  the  CFA  and  admimstered 
i.d.  in  the  region  of  the  peritoneal  cavity.  For  the  &st  and  the  second  booster  doses  the 
HSA  was  emulsffied  in  the  IFA  and  administered  i.d  in  the  region  of  the  p)eritoneal 
cavity.  For  the  third  booster  dose  a  solution  of  HSA  in  saline  was  applied  i.v.  in  ear 
veins. 

Two  weeks  after  receiving  the  last  booster  dose,  rabbits  were  bled  and  blood 
samples  collected  separately.  The  immune  sera  were  decomplemented  at  56  °C  for  30 


7 


min.  The  antibody  concentrations  of  the  immune  sera  were  previously  determined  by  a 
microgravimetric  method.* 

The  Two-Cross  Immunodiffusion  Experiments.  For  the  two-cross 
immunodiffusion  experiments  1%  vv/v  agarose  gel  was  prepared  using  ag^ose  L 
(Behrine  Institute.  W.  Germany).  Phosphate  buffered  saline  (PBS),  pH  5.0,  5.5.  and 
7.0,  contained  0.05  M  KH:P04.  0.10  M  NaCl.  0.1%  w/v  NaNj  and  a  variable  amount 
of  NaOH.  The  borate  buffered  saline  (BBS)  contained  0.05  M  H3BO3,  0.1  M  NaCl, 
0.1%  w7vNaN3  and  NaOH  was  added  in  order  to  reach  pH  8.6.  These  buffer  solutions 
were  used  to  dilute  the  precipitating  components  and  to  equilibrate  the  agarose  in 
which  the  two-cross  immunodiSlision  experiments  were  performed.  The 
immunodiffusion  experiments  were  carried  out  at  20  and  40  °C.  For  all  technical 
details  concerning  the  two-cross  immunodiffusion  experimental  procedure  and  the 
evaluation  of  the  results,  refer  to  Pokric  and  Pucai^  or  Zivkovic  et  al.  for  somewhat¬ 
less-detailed  descriptions. 

The  Determination  of  the  Solubility  of  Immune  Complexes.  The  solubility  of  the 
immune  complexes  was  determined  by  consecutive  dilutions  of  the  solutions  of  the 
precipitating  components  until  the  dilution,  mo,  at  which  no  precipitate  is  formed.  The 
starting  precipitating  system  was  prepared  at  equivalence  by  mixing  20  pi  of  the 
immune  serum  and  antigen  solution  which  volume  was  calculated  by  eq  6.  For  the 
dilutions,  the  total  volume  of  the  precipitating  system  was  maintained  constant,  but  the 
volume  parts  of  the  solutions  of  the  precipitating  components  were  subsequently 
reduced  and  simultaneously  the  volumes  of  the  buffer  solution  increased.  Relative  laser 
light  scattering  (%RLLS)  measurements  were  performed  using  a  Hyland  laser 
nephelometer  PDQTM  (Travenol  Laboratories,  Costa  Mesa,  Ca,  USA).  For  each 
dilution,  the  measurement  was  carried  out  until  the  maximum  values  of  scattered  light, 
(%RLLS)  max  were  reached.  The  experiments  were  performed  at  20  and  40  °C. 


RESULTS 

Figure  1  illustrates  the  changes  of  the  factor  F  (eq  1 8)  vs.  n  for  certain  values  of  r. 
The  molar  Ab/Ag  ratio,  n,  refers  to  the  immune  complex  prepared  at  equivalence.  The 
ratio  of  the  molar  concentrations  of  bound  Ag  over  free  Ag,  r  =  cAs.b*/cAs*,  is  related 
to  the  equilibrium  reached  in  the  solute  phase.  Figure  1  shows  that  for  each  of  the 
chosen  r  values  ranging  between  5  and  12.5  and  varying  n  values  from  1  to  7,  the 
factor  F  reaches  a  m^um.  The  n  values  must  lay  in  a  limited  range.  The  values  of  n 
<  2  will  seldom  give  precipitates  at  equivalence.  The  values  of  n  >  5  will  exceptionally 
occur  and  then  the  maxima  of  the  F  function  vs.  n  (Figure  1)  have  a  very  steep  shape. 
The  latter  case  would  drastically  increase  the  equilibrium  constant  (eq  19).  For  n  values 
between  1.5  and  5.5,  and  according  to  eq  21,  the  values  of  the  Fma\  wiU  lay  between 
13.79  and  3412.  From  eq  18  proceedes  that  in  this  case  the  molar  ratios,  r,  of 
associated,  cAs.b,  and  free  antigen.  cAs.  in  the  solute  phase  will  predominantly  lay 

between  3.07  and  13.95.  ,  t-ms - 

The  diffusion  coefficients  of  antigen  and  antibody.  DA  =  6.1  x  10  cm'/s  and  DB  - 
4.1  x  10*^  cm‘/s.  determined  by  the  immunodifiiision  method  (eqs  9  and  10). 
correspond  to  the  HSA  and  IgG  class  rabbit  antibodies  having  molecular  mass  = 
65  000  Da  and  MB  =  160  000  Da  (eq  1 1).  respectively. 

.At  example  of  the  determination  ot  the  dilution  factor,  mi,  at  which  no  detectable 
precipitate  is  formed,  is  presented  in  Figure  2.  The  quantity  of  the  precipitate  formed 
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was  determined  by  the  laser  light  scattering  measurements.  A  strict  lineariu',  r  >  0.9, 
between  the  intensity  of  scattered  light.  (%RJLSS)nia\  2nd  the  dilution,  1/m,  was 
obtained  in  all  examined  s\stems.  Thus,  it  is  not  necessary  to  find  experimentally  the 
dilution  at  which  precipitation  occurs  no  more.  The  exirapolation  of  the  straight  line 
(%RLSS)ma.x  vs.  1/m  to  %oRLLS  =  0  gives  the  required  dilution  1/mo. 

The  .precipitating  titers  of  both  reactants  determined  by  the  two-cross 
immunodiffusion  (eqs  7  and  8)  and  the  dilutions  factors  required  for  the  corrections  to 
equilibrium  conditions,  are  presented  in  Table  1.  The  values  obtained  for  the  dilution 
factor,  m  >  1 ,  proved  that  the  precipitating  titers  determined  under  conditions  of  the 
double  diffusion  do  not  refer  to  equilibrium  conditions.  Table  1  and  Figure  2  shows 
that  mo  is  always  smaller  at  40  than  at  20  °C.  This  suggests  that  the  precipitation  under 
conditions  of  diffusion  at  higher  temperatures,  occurs  under  conditions  closer  to  that  of 
equilibrium. 

The  precipitating  concentrations.  Cpr.  of  antigen  and  antibodies  at  which  the 
precipitation  starts  under  conditions  of  double  diffusion  (eqs  7  and  8),  are  presented  in 
Table  2.  Taking  into  account  antibody  and  antigen  molecular  masses,  the  molar  Ab/Ag 
ratio,  n,  required  for  the  formation  of  the  immune  complex  at  equivalence,  was 
calculated  (eq  14).  The  stoichiometric  composition  of  an  Ab/Ag  complex  prepared 
under  identical  experimental  conditions  but  at  two  different  temperatures,  is  constant 
(Table  2).  The  small  differences  between,  n,  at  20  and  40  °C  arise  from  the  errors  in 
experimental  determinations  of  the  precipitating  concentrations,  cApr  and  cBpr,  (eqs  7 
and  8). 

The  mean  values  of  n  are  used  in  order  to  calculate  the  equilibrium  constants  Ka' 
(eqs  22).  In  order  to  calculate  Ka',  the  precipiting  concentrations  of  antigen  solutions 
obtained  in  g/1  (Table  2)  must  also  be  expressed  in  mol/1  and  corrected  to  equilibrium 
conditions,  cApr*  by  using  dilution  factors,  mo  (Table  1).  Assuming  that  IgG  class  Abs 
are  bivalent,  the  equilibrium  constants,  Ka,  are  calculated  according  to  the  eq  5,  and 
presented  in  Table  3.  The  precipitating  titers  of  the  antigen  solution  corrected  to 
equilibrium  conditions,  PTA*  are  also  presented  in  Table  3.  In  spite  of  the  fact  that  in 
our  experiments  the  same  antigen  solution  of  a  constant  concentration,  cAo,  was  used, 
different  PTA*  values  were  obtained  under  different  experimental  conditions. 
However,  the  data  presented  in  Table  3  shows  that  Ka  and  PTA*  values  are  well- 
related.  Thus,  the  PTA*  data  could  be  used  for  the  rough  ranking  of  the  avidities  of 
different  immune  sera  for  the  same  multideterminant  antigen. 

Lower  Ka  values  (Table  3)  at  higher  rather  than  at  lower  temperatures,  suggest  that 
antiHSA/HSA  binding  is  an  exothermic  process. 


DISCUSSION 

The  method,  proposed  for  the  determination  of  the  equilibrium  constants  of  the 
reaction  of  the  mutideterminant  antigen  and  the  specific  polyclonal  antibodies,  requires 
onlv  the  knoweledge  of  two  accessible  experimental  parameters;  the  concentration  at 
which  the  antigen  starts  to  precipitate  under  equilibrium  conditions  and  the  molar 
.Ab/Ag  ratio  in  the  immune  complex  prepared  at  equivalence.  The  immune  complexes 
can  be  prepared  by  the  direct  mixing  of  biological  fluids  in  which  immunoreacting 
components  naturally  occur.  This  offers  a  great  advantage  of  dealing  with  unmodified 
molecules,  since  the  separation,  purification  and  labelling  of  either  the  antigen  or  the 
antibody,  w'hich  might  modify  the  binding  properties,'"'  is  not  required. 
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Different  values  of  precipitating  titers  ot  the  antigen  solution  corrected  to 
equilibrium  conditions,  PTA*,  (Table  3)  obtained  for  the  same  Ab/Ag  s>'stem  at 
different  pH  values,  as  well  as  under  the  identical  experimental  condition  at  two 
different  temperatures  (Table  1).  proved  that  the  solubility,  cApr  (eq  8),  of  a  HSA-anti 
HSA  system  is  dependent  on  both  pH  and  temperature.  The  changes  of  the 
stoichiometry  of  an  antiHSA-HSA  pair  at  various  pH  (Table  2),  previously  observed  in 
the  experiments  with  a  number  of  Ab/Ag  complexes.  are  caused  by  the  changes  of 

the  charge  of  Ab  and  Ag  molecules  with  ambient  pH.  The  dependence  of  both  the 
solubilty  and  stoichiometr}'  of  the  Ab/Ag  system  regarding  the  experimental  conditions, 
explain  the  variations  of  equilibrium  constant  values  of  a  given  antiHSA-HSA  pair  at 
various  experimental  conditions  (Table  3)  since  Ka  is  directly  related  to  the  solubility, 
c.Apr,  and  Ab/Ag  ratio,  n,  (eqs  5  and  22).  This  finding  agrees  with  the  literature  data 
that  the  value  of  the  equilibrium  constant  of  an  Ab/Ag  system  is  affected  by  the 
conditions  under  which  the  determination  was  carried  out.  ' 

A  part  of  the  difficulties  of  the  determination  of  the  equilibrium  constants  of 
reaction  between  multideterminant  antigen  and  polyclonal  antibodies  lie  in  the  fact  that 
Ka  values  are  often  dependent  upon  the  absolute  amounts  of  antigen  and  antibodies, 
the  dilution  and/or  the  volume  of  the  immunoreacting  system,  as  well  as  upon  the  ratio 
of  Ab/Aa  concentrations.'^"'^  So  far,  the  state  of  equilibrium  is  disturbed  and  the 
dissociation  rate  is  greatly  increased  when  one  of  the  precipitating  components  is 
present  in  a  great  excess.'®  In  our  experiments  the  Ab/Ag  concentration  ratio  is 
determined  in  advance  by  preparing  the  precipitating  system  at  equivalence,  while  the 
Ka  determined  at  equilibrium  is  invariable  to  the  total  concentrations  of  antigen  or 
antibodies  in  biological  fluids. 

The  determination  of  the  equilibrium  constant  Ka  in  our  experiments  was  possible 
firom  the  data  obtained  bv  the  two-dimensional  double  immunodiffusion  concerning  the 
preparation  of  an  Ab/Ag  system  at  equivalence,  the  precipitating  titer  of  antibody 
solution,  PTA,  and  diffusion  coefficients,  D,  and/or  molecular  masses,  M,  of 
immunoreacting  molecules.  The  concentration  of  antigen  solution,  cAo,  should  be 
known  or  determined  in  advance  in  order  to  calculate  the  critical  precipitating 
concentration  of  antigen,  cApr,  (eq  2)  required  for  the  determination  of  Ka  (eq  22). 
The  determination  of  the  precipitating  titer  (PT)  by  the  two-cross  immunodiffusion 
does  not  require  the  use  of  the  standards  as  well  as  the  knowledge  of  the 
concentrations  of  the  solutions  of  the  precipitating  components.  For  a  constant 
concentration  of  the  antigen  solution,  cAo,  the  precipitating  titers  referring  to 
equilibrium  conditions.  PTA*,  depend  solely  on  the  critical  precipitatmg 
concentrations,  cApr*  (eq  7).  So  far,  according  to  the  theory,  the  different 
concentrations  of  identical  antibodies  obtained  by  dilutions  of  an  immune  serum,  would 
not  influence  the  PTA*  and/or  cApr*  values.*  The  comparison  of  PTA*  and  Ka  values 
(Table  3)  shows  that  they  are  weU-related.  Thus,  under  the  conditions  when  the 
concentrations  and  molecular  masses  of  antigens  and  antibodies  are  unknown  or 
difficult  to  determine,  the  PTA*  could  be  used  for  a  rough  ranking  of  relative  affinities 
of  different  immune  sera  against  a  same  antigen. 
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Table  1  The  Precipitating  Titers  of  the  Antigen  (PTA)  and  Antibody  (PTB)  Solutions, 
and  the  Minimal  Dilution  Factors  (mo)  in  antiFlSA/HSA  Systems. 


immune  pH 
serum 

20  °C 

40  °C 

PTA 

PTB 

mo 

PTA 

PTB 

mo 

5.0 

356 

265 

3.95 

549 

400 

1.66 

5.5 

1 

7.0 

509 

262 

4.90 

565 

293 

3.22 

509 

348 

2.71 

562 

384 

1.79 

8.6 

659 

412 

3.48 

735 

434 

2.37 

5.0 

339 

377 

5.99 

387 

448 

4.93 

5.5 

2 

470 

373 

7.75 

551 

448 

6.37 

7.0 

487 

470 

5.15 

551 

554 

3.72 

8.6 

682 

604 

5.88 

718 

638 

4.86 

5.0 

270 

189 

3.72 

462 

321 

1.94 

5.5 

382 

189 

4.76 

554 

271 

3.21 

lA 

382 

250 

3.02 

547 

349 

1.93 

8.6 

508 

280 

3.10 

761 

407 

1.52 

The  precipitating  titers  and  dilution  factors 
quantities. 

are  dimensionless 
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Table  2  The  Precipitating  Concentrations  (Cpr)  and  the  Ab/Ag  Molar  Ratio  (n)  in 
antiHSA/HSA  System  Prepared  at  Equivalence. 


5.0 

1.4045 

6.6868 

2.06 

0.9108 

4.4300 

2.11 

2.085 

5.5 

0.9823 

6.7633 

2.98 

0.8850 

6.0477 

2.96 

2.970 

7.0 

0.9823 

5.0919 

2.25 

0.8897 

4.6145 

2.  25 

2.250 

8.6 

0.7194 

4.3009 

2.59 

0.6803 

4.0829 

2.60 

2.595 

5.0 

1.4749 

7.1618 

2.10 

1.2920 

6.0268 

2.02 

2.060 

5.5 

1.0638 

7.2386 

2.95 

0.9074 

6.0268 

2.88 

2.915 

7.0 

1.0267 

5.7447 

2.43 

0.9074 

4.8736 

2.33 

2.380 

8.6 

0.7331 

4.4702 

2.64 

0.6964 

4.2320 

2.63 

2.635 

5.0 

1.8519 

7.1058 

1.67 

1.0823 

4.1838 

1.68 

1.675 

5.5 

1.3089 

7.1058 

2.35 

0.9025 

4.9557 

2.38 

2.365 

7.0 

1.3089 

5.3720 

1.78 

0.9141 

3.8481 

1.82 

1.800 

8.6 

0.9843 

4.7964 

2.11 

0.6570 

3.2998 

2.18 

2.145 

*  The  precipitating  concentrations,  expressed  in  gA.  are  calculated  (eqs  7  and  8)  taking  into 
account  that  the  concentration  of  HSA  solutions  (A)  amounted  to  cAo  =  0.500  g/1/  and  the 
concentrations  of  anti  HSA  in  rabbit  sera  (B)  amounted  to  cBo  =  1.772  gA  (rabbit  1),  cBo  = 
2.700  gA,  (rabbit  2),  and  cBo  =  1.343  gA  (rabbit  3).* 

^  n  is  calculated  (eq  14)  taking  into  account  that  molecular  mass  of  antigen  and  antibodies 
amounts  to  65  000  Da  and  16  0000  Da,  respectively. 
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Table  3  The  Precipitating 
Equilibrium  Conditions  and 
(Ka). 

Titers  of  Antigen  Solution  (PTA*)  Referring  to  the 
the  Equilibrium  Constants  of  antiHSA/HSA  Reactions 

immune 

serum 

pH 

20 

°C 

40 

°C 

PT.A.*  ^ 

Ka  10 

=  PTA* 

KalO^^ 

1 

5.0 

1406 

3.23 

911 

2.60 

2 

5.0 

2031 

3.89 

1908 

3.76 

5.0 

1004 

2.75 

869 

2.60 

1 

5.5 

2494 

4.13 

1819 

3.53 

2 

5.5 

3643 

5.00 

3510 

4.19 

n 

j 

5.5 

1818 

3.62 

1778 

3.58 

1 

7.0 

1379 

3.17 

1006 

2.71 

2 

7.0 

2508 

4.25 

2066 

3.85 

j 

7.0 

1154 

2.95 

1056 

2.82 

1 

8.6 

2419 

4.14 

1742 

3.51 

2 

8.6 

4010 

5.13 

3490 

4.95 

8.6 

1575 

3.41 

1157 

2.92 

^  The  precipitating  titers  are  dimensionless  quantities. 

#  The  equilibrium  constants  are  expressed  in  1/mol  and 
calculated  according  to  eqs  5  and  22.  For  this  purpose  the 
precipitating  concentrations  of  antigen  solutions  in  g/1  (Table  2) 
are  transformed  to  equilibrium  conditions  by  using  dilution 
factor  mo  (Table  1)  and  expressed  in  mol/1,  taking  into  account 
that  molecular  mass  of  antigen  is  65  000  Da.  The  average  of  n 
values  determined  at  20  and  40  °C  (Table  2)  is  introduced  into 
eqs  5  and  22. 
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0.1  Abstract 

We  introduce  a  new  notion  that  connects  the  combinatorial  concept  of  regularity 
with  the  geometrical  notion  of  face-transitivity.  This  new  notion  implies  finiteness 
results  in  case  of  bounded  maximal  face  size.  We  give  lists  of  structures  for  some 
classes  and  investigate  polyhedra  with  constant  vertex  degree  and  faces  of  only  two 
sizes. 


1  Introduction 

A  planar  (finite  or  infinite)  graph  is  called  face-transitive,  if  the  automorphism  group 
acts  transitively  on  the  set  of  faces.  For  finite  polyhedra  (see  [Ma7l])  as  well  as  for 
infinite  graphs  in  the  plane  with  finite  faces  and  finite  vertex  degree  (that  is  tilings, 
see  [Ba90][De90])  it  is  well  known  that  the  graph  can  be  realized  with  its  full  com¬ 
binatorial  automorphism  group  as  its  group  of  geometrical  symmetries.  Restricting 
the  attention  to  polyhedra  with  constant  vertex  degree,  up  to  combinatorial  equiva¬ 
lence  only  the  5  Platonic  solids  have  an  automorphism  group  acting  transitively  on 
their  faces.  In  the  remaining  text  we  will  restrict  our  attention  to  polyhedra  with 
constant  vertex  degree. 

A  natural  generalisation  of  this  concept  -  let  us  call  it  weakly  face-transitive  -  is 
to  require  that  only  faces  of  the  same  size  are  equivalent  under  the  automorphism 
group.  If  we  define  the  0-th  corona  of  a  face  to  be  the  face  itself  and  the  n-th  corona 
to  be  the  set  of  all  those  faces  that  are  contained  in  the  (n-l)-th  corona  or  share 
an  edge  with  it,  we  can  further  relax  this  concept  and  only  require  some  coronas 
of  fixed  size  to  be  isomorphic  by  an  isomorphism  mapping  the  central  faces  onto 
each  other.  .4  polyhedron  with  all  n-coronas  of  faces  of  the  same  size  isomorphic 
is  called  weakly  n-transitive.  Obviously,  all  polyhedra  are  weakly  0-transitive  and  if 
a  polyhedron  is  weakly  (n-f  l)-transitive,  it  is  also  weakly  n-transitive.  So  the  first 
interesting  case  to  study  is  the  case  of  weakly  l-transitive  polyhedra.  Still  relaxing 
this  condition  by  not  requiring  the  first  coronas  to  be  isomorphic,  but  just  to  be 
isomorphic  as  multisets  (that  is:  every  face  of  a  given  size  i  must  have  the  same 
number  of  neighbours  of  size  i'  for  every  i'),  still  gives  a  very  restrictive  condition 

*e-mail:  gunnarSmathematik.uni-bielefeld.de 

^e-mail:  deza@dmi.ens.fr 


1 


and  as  we  will  see,  it  already  implies  finiteness  in  case  the  maximal  size  of  a  face 
is  bounded.  We  call  this  condition  (strong)  face-regularity.  So  the  class  of  all  face- 
regular  polyhedra  contains  all  weakly  n-face-transitive  polyhedra  for  any  n  >  1,  and 
therefore  also  the  weakly  face-transitive  or  even  face-transitive  ones. 

The  same  concept  can  be  reached  by  strengthening  the  notion  of  a  monochro¬ 
matic  regular  dual:  Let  p,-  denote  the  number  of  i-gons  in  a  given  polyhedron.  We  use 
the  notation  p  =  (p3,p4,  ...pb)  for  the  face-vector  (or  p-vector)  of  a  polyhedron; 

b  is  the  maximal  number  for  which  a  face  with  size  b  exists. 

A  less  restrictive  definition  of  face-regularity,  but  only  for  bifaced  polyhedra, 
was  considered  in  [DGr97c].  Namely,  if  only  pa  and  pb  are  non-zeros  and  a  <  b, 
then  the  number  /  of  z-faces,  edge-adjacent  to  any  given  z-face,  was  required  to  be 
independent  of  the  choice  of  the  z‘-face,  for  z  either  a  or  b.  For  a  A:-valent  polyhedron 
we  write  aRj  or  bRf,  if  this  partial  (or  weak)  face-regularity  holds  for  a-gonal  or, 
respectively,  6-gonal  faces.  All  such  simple  polyhedra  with  6  <  6,  as  well  as  all 
4-valent  ones  with  6  =  4,  except  the  cases  ARq  for  ik;a,b)  =  (3;  4, 6)  and  aRo,  aRi 
for  {k]a,b)  e  {(4; 3, 4),  (3; 5, 6)}  were  found  in  [DGr97c].  For  example,  all  12  (resp. 
6,4,10,26)  polyhedra  bRf  for  all  five  possible  cases  -  k  -  4]  k  =  3,b  <  6  and 
k  =  3,  b  =  6,  a  e  {3,4, 5}  -  are  listed  there.  (The  graphs  of  all  26  6Rf  fullerenes 
(i.e.  {k,a,b)  =  (3,5,6))  are  given  in  list  7  below.)  In  these  cases  8  (resp.  6,4,9,12) 
polyhedra  are  also  aRj,  i.e.  face-regular  in  the  sense  of  the  present  paper. 

The  face-regularity  which  we  consider,  is  a  purely  combinatorial  property  of  the 
skeleton  of  a  polyhedron.  It  is  different  from  the  affine  notion  of  regular-faced  (i.e. 
all  faces  being  regular  polygons)  polyhedra. 

We  use  the  abbreviation  frp  for  face-regular  polyhedron.  An  frp  in  one  of  the  lists 
below  is  described  by  ij,  where  j  is  the  number  of  the  List  and  z  is  its  number  in 
List  j.  We  also  use  the  notation  z  for  zh. 

We  call  two  frp  fr-isomers,  if  they  have  the  same  parameters  as  frp,  i.e.  v,  the 
p-vector  and  the  numbers  /(a,  6),  i.e.  the  number  of  6-faces,  edge-adjacent  to  each 
a- face  for  any  0,6,  coincide. 

.A.11  fr-isomers  in  List  1  are  bifaced.  They  are:  ll,12(z;  =  16);  20,21(z;  =  32); 
32. 33(u  =  80)  and  3-faced  49,  50(z;  =  20) 

All  fr-isomers  in  List  2  are: 
for  V  =  20:  IO2,  II2; 
for  V  =  24:  619,622; 
for  V  =  26:  I62 — 192; 

for  u  =  28:  662, 672  ;  692, /O2;  722 — ^^2;  752, 7/'2;  762,782; 
for  V  =  32:  28,— 3I2;  322— -342;  872, 8S2;  892.  OOz: 
for  f  =  .36:  22.39:  952,96',:  1022,103,; 
for  V  =  40:  42,.  432. 

for  f  =  44:  62.62:  492—512:  1192,1202;  1372—1392 
All  fr-isomers  in  Lists  4  and  5  are  64, 74  with  v  =  14. 

Considering  the  polyhedra  of  Lists  1,2  and  3  with  respect  to  collapsing  of  all 
triangular  faces  to  points,  (i.e.  the  inverse  to  vertex-truncation),  we  see  that  in  List 
1,  any  such  collapsing  gives  a  member  of  List  1.  But  in  List  2  there  are  polyhedra, 
such  that  this  collapsing  does  not  give  an  frp.  The  smallest  one  is  II62. 

Examples  of  sequences  of  frp,  such  that  each  of  them  comes  from  the  previous  one 
by  1-edge  truncation  are:  1, 4,  2, 6, 7, 8, 9;  1,4,35,36,59,39, 11;  and  1,4, 2,  6, 14,49. 
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Some  infinite  families  of  3- valent  frp: 

Bifaced  :  Prism„  and  Barrel^  (i.e.  two  n-gons  separated  by  two  layers  of  5-gons); 

3- faced  :  Prism„,  BarreU,  truncated  on  all  2n  vertices  of  both  n-gons; 

Prism2n,  edge-truncated  on  n  disjoint  edges  of  only  one  n-gon; 

Prismsn,  edge-truncated  on  n  edges,  separated  by  at  least  2  edges,  of  only  one 
n-gon; 

4- faced  :  Prismn,  (vertex-)  truncated  on  all  vertices  of  only  one  n-gon; 

5- faced  ;  Barrel^,  truncated  on  all  vertices  of  only  one  n-gon. 

In  fact,  many  of  the  frp  in  the  lists  are  some  partial  truncations  of  Prism„  and 
Barrelji-  For  example,  there  are  exactly  10  frp,  which  are  partial  truncations  of  the 
Cube:  There  are  1  (resp.  3,1, 3,1,1)  possibilities  for  truncations  on  1  (resp  2,3,4,6,8) 
vertices. 

Remarks: 

(i)  Among  the  chiral  polyhedra  in  the  lists  are,  for  example,  Nrs  41,  61,  62,  63,  100, 
104  in  List  2;  Nr.9  in  List  3;  and,  especially,  Nrs  13,22,34  in  List  1  and  9  in  List  4 
with  symmetry  T,  0, 1  and  0,  respectively. 

(ii)  None  of  the  polyhedra  in  any  of  our  Lists  has  a  trivial  symmetry  group. 


The  Finiteness  of  Classes  with  Bounded  Face  Size 


Theorem  1  For  every  n  €  N  there  is  only  a  finite  number  of  face-regular  polyhedra 
with  constant  vertex  degree  and  face  sizes  not  exceeding  n. 


Proof. 

We  will  assume  that  the  polyhedra  in  question  all  contain  an  n-gon.  The  total 
number  can  be  obtained  by  summing  up  over  all  m  <n. 

Remind  that  for  i.j  G  N  the  number  f[i,j)  denotes  the  number  of  neighbouring 
j-gons  of  an  i-gon.  So  f{i,j)pi  =  f{j,i)Pj  is  the  number  of  edges  between  i-gons  and 
j-gons  and  we  can  express  pj  as  pj  =  ^^Pi  i-gonal  and  j-gonal  faces  share 

at  least  one  edge. 

Look  at  the  f-graph  G  with  vertex  set  V  =  {i\pi  >  0}  and  edge  set  E  = 
{{i,j}\f{hj)  >  0}-  This  graph  is  connected  since  the  dual  of  the  underlying  poly¬ 
hedron  is  connected.  We  can  express  every  other  value  pi  by  a  formula  of  the  kind 


fihk)  f(il,i2) 


Pn  =■■  g{ifPn 


if  i. ii, ...  ,ik.n  is  d.  (e.g.  shortest)  path  from  i  to  n  in  G. 

Since  for  fixed  n  all  the  f{i,j)  as  well  as  the  length  of  the  path  are  bounded  and 
since  the  number  of  graphs  on  n  vertices  is  also  finite,  we  have  only  a  finite  number 
of  possible  sets  of  equations  pi  =  g{i)Pn  (3  <  f  <  n). 

As  a  well  known  consequence  of  Euler's  formula  we  get  -  i)pi  =  12  in  the 

3-valent  case.  “  ^)'Pi  —  8  for  4-valent  polyhedra  and  ^”.3(10  -  2>i)pi  =  20 

for  5-valent  polyhedra. 

Substituting  pi  by  g{i)pn  in  this  formula,  every  set  of  equations  gives  exactly  one 
solution  for  Pn  and  therefore  also  for  each  pi.  So  for  every  set  of  equations  there  is  a 
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well  determined  number  of  faces  and  therefore  there  is  a  maximum  number  of  faces 
that  is  possible. 

□ 

Corollary  1  If  in  the  cubic  case  the  number  of  non-hexagons  is  bounded  or  in  the 
quartic  case  the  number  of  non-squares  is  bounded,  then  there  is  only  a  finite  number 
of  face-regular  polyhedra. 

Proof. 

The  fact  that  the  number  of  faces  smaller  than  6  (resp.  4)  is  bounded  gives 
an  upper  bound  on  the  maximum  face  size,  implying  the  result  by  the  previous 
theorem. 

□ 

Statistics 

In  this  section  we  will  give  some  statistics  about  the  number  of  face-regular  polyhedra 
compared  to  the  number  of  all  polyhedra  for  some  classes. 
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vertices 

polyhedra 

face- regular  polyhedra 

vertices 

polyhedra 

face-regular  polyhedra 

4 

1 

1 

8 

1 

1 

6 

1 

1 

10 

1 

1 

8 

2 

2 

12 

2 

2 

10 

5 

4 

14 

5 

3 

12 

14 

7 

16 

12 

3 

14 

50 

5 

18 

34 

1 

16 

233 

15 

20 

130 

10 

18 

1  249 

9 

22 

525 

2 

20 

7  595 

33 

24 

2  472 

8 

22 

49  566 

11 

26 

12  400 

5 

24 

339  722 

58 

28 

65  619 

10 

26 

2  406  841 

29 

30 

357  504 

7 

28 

17  490  241 

99 

32 

1  992  985 

30 

30 

129  664  753 

44 

34 

11  284  042 

1 

32 

977  526  957 

194 

36 

64  719  885 

22 

34 

7  475  907  149 

25 

38 

375  126  827 

16 

36 

57  896  349  553 

318 

40 

2  194  439  398 
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Table  1:  Cubic  polyhedra  Table  2:  Cubic  polyhedra  without 

triangles 


vertices 

polyhedra 

face-regular  polyhedra 

4 

1 

1 

6 

1 

1 

8 

2 

2 

10 

5 

4 

12 

10 

7 

14 

13 

3 

16 

30 

7 

18 

44 

2 

20 

77 

10 

24 

184 

6 

26 

267 

2 

28 

420 

3 

30 

595 

1 

32 

883 

5 

38 

2  445 

1 

44 

6  319 

1 

52 

19  345 

1 

56 

32  219 

2 

60 

52  293 

1 

68 

128  343 

1 

80 

425  998 

2 

140 

??? 

1 

Table  3:  Cubic  polyhedra  without  faces  larger  than  a  hexagon.  For  all  vertex 
numbers  not  mentioned,  no  face-regular  polyhedra  exist. 


2  List  1:  all  64  face-regular  simple  polyhedra  with 

6  <  6. 

Among  the  64  polyhedra  of  the  List,  the  first  three  are  regular,  then  there  are  31 
bifaced  ones:  six  with  b  <  o,  four  3^  (for  n  =  12,  16,  16,  26),  nine  4n  (for  n  = 
12.  14.  20,  20.  24.  26,  32.  32.36)  and  12  fullerenes  5^  (which  are  F^^iD^d)-  Fn{Td)- 
T32(D3/.),  F3s(C3.):  F44(r),  F,2[T)..  F,,{Td)..  Feoih):  FesiTd).  Fsoilh):  Fso{D,,), 
Fuo{I)).  Nrs.  35-57  have  three  types  of  faces  and  last  seven  polyhedra,  Nrs.  58-64, 
have  four  types  of  faces. 

Among  the  64  polyhedra  of  the  List  1,  three  are  regular  ones  (Tetrahedron, 
Cube  and  Dodecahedron),  five  are  semi-regular  (3-,  5-,  6-gonal  prisms,  truncated 
ociahehedron  and  truncated  Icosahedron)  and  no  one  is  regular-faced  from  the  list  of 
92  in  [Joh66].  But  there  are  three,  which  are  dual  to  regular- faced  snub  disphenoid, 
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3-augmented  3-gonaI  prism  and  gyroelongated  square  dipyramid  (last  three  have 
number  84,  51  and  17,  respectively,  in  the  list  of  [Joh66]).  Together  with  three 
regular  ones  and  3-,  5-gonal  prisms,  it  gives  the  duals  of  all  eight  convex  deltahedra. 


Nr.l  V  =  A 
P3  =  4  :  3 
Groupsize:  24 
Group:  Td 


Nr. 2  t;  =  8 

P4  =  6  :  0,4 
Groupsize:  48 
Group:  Oh 


Nr.3  u  =  20 
Po  =  12  :  0,0,5 
Groupsize:  120 
Group:  h 
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Nr.4  V  =  6 

Nr. 5  V  =  12 

Nr.6  u  =  10 

Nr.7  x;  =  12 

P4  =  3  :  2,2 

p3  =  6  :  1,0,4 

P5  =  2  :  0,5,0 

P5  =  4  :  0,3,2 

P3  =  2  :  0,3 

P3  =  2  :  0,0,3 

P4  =  5  :  0,2,2 

P4  =  4  :  0,1,3 

Groupsize:  12 

Groupsize:  12 

Groupsize:  20 

Groupsize:  8 

Group:  Dsh 

Group:  Dsa 

Group:  Dsh 

Group:  D2d 

Nr.8  V  =  14 
P5  =  6  :  0,2,3 
p4  =  3  :  0,0,4 
Groupsize:  12 
Group:  Dsh 


Nr.9  V  =  16 
P5  =  8  :  0,1,4 
P4  =  2  :  0,0,4 
Groupsize:  16 
Group: 


Nr.lO  u=12 
P6  =  4  :  3, 0,0, 3 
P3  =  4  :  0,0, 0,3 
Groupsize:  24 
Group: 


Nr. 11  u  =  16 

P6  =  6  :  2, 0,0,4 
Pz  =  4  :  0,0, 0,3 
Groupsize:  24 
Group:  Ta 


Nr. 12  u  =  16 

Ps  =  6  :  2,0,0, 4 
P3  =  4  :  0,0,0,3 
Groupsize:  8 
Group:  D2h 


Nr.  13  V  =  28 
P6  =  12  :  1,0,0,5 
P3  =  4  :  0,0, 0,3 
Groupsize:  12 
Group:  T 


Nr.l4  1?  =  12 

Ps  =  2  :  0,6, 0.0 
P4  =  6  :  0,2, 0,2 

Groupsize:  24 
Group:  Det, 


Nr. 15  V  =  14 
P6  =  3  :  0,4, 0,2 

P4  =  6  :  0,2,0,2 

Groupsize;  12 
Group:  Dsh 


Nr.  16  V  =  20 
P6  =  6  :  0,2,0,4 

P4  =  6  :  0,2, 0,2 

Groupsize:  12 
Group:  Dsd 


Nr.18  t;  =  24 
Pq  =  8  :  0,3, 0,3 
P4  =  6  :  0,0,0,4 
Groupsize:  48 
Group:  Oh 


Nr. 22  V  =  56 
P6  =  24  :  0,1,0, 5 
P4  =  6  :  0,0, 0,4 
Groupsize;  24 
Group:  0 


Nr.l9  V  =  26 
P6=9  :  0,2,0,4 
P4  =  6  :  0,1, 0,3 
Groupsize:  12 
Group:  Dzh 


Nr.20  V  =  32 
P6  =  12  :  0,2,0,4 
P4  =  6  :  0,0,0,4 
Groupsize:  12 
Group:  D^d 


Nr.l7  V  =  20 
P6  =  6  :  0,3, 0,3 
P4  =  6  :  0,1, 0,3 
Groupsize:  6 
Group:  Se 


Nr.21  v  =  Z2 
P6  =  12  :  0,2,0,4 
P4  =  6  :  0,0,0,4 
Groupsize:  48 
Group:  Oh 


Nr.23  V  =  24 

p&  =  2  :  0,0, 6,0 

P5  =  12  :  0,0, 4,1 
Groupsize:  24 
Group:  Ded 


Nr.  24  V  =  28 
P6  =  4  :  0,0, 6,0 
P5  =  12  :  0,0,3,2 
Groupsize:  24 
Group:  Td 


Nr.25  V  =  32 
P6  =  6  :  0,0,4,2 
P5  =  12  :  0,0, 3,2 
Groupsize:  12 
Group:  Dsh 


Nr  .26  v  =  38 

P6  =  9  :  0,0,4,2 
P5  =  12  :  0,0,2,3 
Groupsize:  6 
Group:  Czv 


Nr.27  i;  =  44  Nr.28  v  =  52  Nr. 29  u  =  56  Nr.30  u  =  60 

P6  =  12  :  0,0, 3,3  P6  =  16  :  0,0,3,3  Pe  =  18  :  0,0,2,4  ps  =  20  :  0,0, 3,3 

P5  =  12  :  0,0,2, 3  P5  =  12  :  0,0,1,4  p5  =  12  :  0,0,2,3  Ps  =  12  :  0,0,0,5 

Groupsize:  12  Groupsize:  12  Groupsize:  24  Groupsize:  120 
Group:  T  Group:  T  Group:  Td  Group:  Ih 


Nr.31  V  =  68 
.P6  =  24:  0,0, 2,4 
P5  =  12  :  0,0, 1,4 
Groupsize:  24 
Group:  Td 


Nr.32  V  =  80 
Pe  =  30  :  0,0, 2,4 
P5  =  12  :  0,0, 0,5 
Groupsize:  120 
Group:  4 


Nr.33  V  =  80 
Pe  =  30  :  0, 0,2,4 
Ps  =  12  :  0,0,0, 5 
Groupsize:  20 
Group:  Deh 


Nr.34  r  =  140 
P6  =  60  :  0,0, 1,5 
P5  =  12  :  0, 0,0,5 
Groupsize:  60 
Group:  I 
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Nr.35  v  =  8 

P5  =  2  :  2,2,1 
P4  =  2  :  1,1,2 
P3  =  2  :  0,1,2 

Groupsize:  4 
Group:  C2v 


Nr. 36  v  =  10 

P5  =  3  :  1,2,2 

P4  =  3  :  0,2,2 

P3  =  1  ;  0,0,3 
Groupsize:  6 
Group:  Csv 


Nr.37  r  =  18 
Pe  =  6  :  1, 2,0,3 
P4  =  3  :  0,0,0, 4 
P3  =  2  :  0,0,0,3 
Groupsize:  12 
Group:  Dzh 


Nr. 38  u  =  10 
Pe  =  1  :  3, 0,3,0 
P5  =  3  :  2, 0,2,1 
P3  =  3  :  0, 0,2,1 
Groupsize:  6 
Group:  C^v 


Nr. 39  V  =  1A 

Pe  =  3  :  2, 0,2,2 
P5  =  3  :  1,0, 2, 2 
P3  =  3  :  0, 0,1,2 
Groupsize:  6 
Group:  Czy 


Nr.40  V  =  2A 
P6  =  6  :  1,0,3, 2 
Po  =  6  :  0,0,2,3 
P3  =  2  :  0,0, 0,3 
Groupsize:  12 
Group:  Dzd 


Nr.41  u  =  24 
Pe  =  6  :  0,0, 2,4 
Po  =  6  :  1,0, 2, 2 
P3  =  2  :  0,0, 3,0 
Groupsize:  12 
Group:  Dzh 


Nr.42 

O 

o\ 

II 

Nr.43 

V  =  18 

Pe  =  6 

1,0, 2, 3 

P6  =  3 

0,0,4, 2 

Po  =  3 

1,0.0, 4 

Po  =  6 

1,0,2, 2 

P3  =  3 

0,0, 1,2 

P3  =  2 

0,0,3,0 

Groupsize:  6 

Groupsize:  12 

Group:  Czh  Group:  Dzh 
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Nr. 44  V  =  16 
P5  =  2:  0,2, 4,0 
ps  =  4:  0,2, 1,2 
P4  =  4  :  0,1,2,1 
Groupsize:  8 
Group:  D2h 


Nr.48  v  =  20 
Pe  =  4  :  0,2, 3,1 
P5  =  4  :  0,1, 1,3 
P4  =  4  :  0,1, 1,2 
Groupsize:  4 
Group:  C2f 


Nr.45  u  =  20 
P6  =  Z  :  0,2,4,0 

P5  =  6  ;  0,1,2,2 

P4  =  3  :  0,0,2,2 
Groupsize:  12 
Group:  Dsh 


Nr.49  u  =  20 
P6  =  4  :  0,2,2, 2 
P5  =  4  :  0,2, 1,2 
P4  =  4  :  0,0,2,2 
Groupsize:  8 
Group:  D2d 


Nr.46  V  =  20 
P6  =  4  :  0,2,3, 1 
P5=4:  0,2,0,3 
P4  =  4  :  0,0,2,2 
Groupsize:  8 
Group:  D2/1 


Nr.50  v  =  20 
P6  =  4  :  0,2,2,2 
P5  =  4  :  0, 2,1,2 
P4  =  4  :  0,0,2, 2 
Groupsize:  8 
Group:  D2h 


Nr.47  u  =  20 
P6  =  4  :  0,1,3,2 
P5  =  4  :  0,2,0,3 
P4  =  4  :  0,1,2,1 
Groupsize:  8 
Group:  D2d 


Nr.51  u  =  24 
P6  =  4  :  0,0,4,2 

P5  =  8  :  0,1,2,2 

P4  =  2  :  0,0,4, 0 
Groupsize:  16 
Group:  D^h 


Nr. 52  V  =  24  Nr. 53  v  =  26 

P6  =  6  :  0,2, 2,2  pe  =  6  :  0,1, 3, 2 

P5  =  4  :  0,1, 1,3  P5  =  6:  0,1, 1,3 

P4  =  4  :  0,0,1, 3  P4  =  3  :  0,0,2, 2 

Groupsize:  4  Groupsize:  6 

Group:  D2  Group:  D3 


Nr.54  r  =  28  Nr.55  v  =  30 

P6  =  8:  0,1,1, 4  P6  =  10  :  0,2,1,3 

P5  =  4:  0,2,1, 2  P5  =  2:0, 0,0,5 

P4  =  4  :  0,0, 2, 2  p4  =  b  :  0,0, 0,4 

Groupsize:  4  Groupsize:  20 

Group:  D-2  Group:  D^h 


Nr.56  V  =  32 
Pe  =  8  :  0,0, 2, 4 
P5  =  8  :  0,1, 2, 2 

P4  =  2  :  0,0, 4,0 
Groupsize:  16 
Group:  D4h 


Nr.57  v  =  32 
P6  =  8:  0,1, 3, 2 
P5  =  8  :  0,0, 2, 3 
P4  =  2  :  0,0,0,4 
Groupsize:  16 
Group:  £>4^ 


©  ^  ^  # 


Nr.58  V  =  10 
P6  =  1  :  2,2,2,0 
P3  =  2:  1, 2,1,1 
P4  =  2:  1,0,2, 1 

P3  =  2  :  0, 1,1,1 

Groupsize:  2 
Group:  C2 


Nr.59  u  =  12 
P5  =  l  :  2, 0,4,0 
Po  =  4  :  1,1, 2,1 

P4  =  1  :  0, 0,4,0 

P3  =  2  :  0,0,2,1 

Groupsize:  4 
Group:  C2rj 


Nr.60  V  =  12' 
Ps  =  2  :  1,2, 2,1 

P5  =  2  :  1,1,1,2 
P4  =  2  :  1, 0,1,2 
P3  =  2  :  0,1,1,1 

Groupsize:  2 
Group:  C2 


Nr.61  v  =  12 

P6  =  2  :  2,1, 2,1 
P5  =  2  :  1,2,0,2 
P4  =  2  :  0,1, 2,1 
P3  =  2  :  0,0,1, 2 

Groupsize:  4 
Group:  C2V 


Nr.62  u  =  16 

Nr.63  V  =  16 

Nr.64 

f  =  16 

Ps  =  3  :  0.2.2.2 

P6  =  3:  1,2,1,2 

P6  =  4 

1,1, 2,2 

Ps  =  3  :  1,0,2, 2 

Ps  =  3  :  0,2,2, 1 

Po  =  2 

0,1,0, 4 

P4  =  3  :  0,2,0, 2 

P4  =  3  :  0,0,2,2 

P4  =  2 

1, 0,1,2 

P3  =  1  ;  0,0, 3,0 

P3  =  1  :  0,0,0, 3 

P3  =  2 

0,1,0, 2 

Groupsize:  6 

Groupsize:  6 

Groupsize:  4 

Group:  Csv 

Group: 

Group:  C2/i 
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3  List  2:  all  160  face- regular  simple  polyhedra 
with  6  =  7  and  up  to  24  faces 


Nr.l  V  =  20 
P7  =  6  :  3, 0,0, 0,4 
P3  =  6  :  0,0,0, 0,3 
Groupsize:  12 


Nr.2  V  =  36 
P7  =  12  :  2,0, 0,0,5 
p3  =  8  :  0,0,0, 0,3 
Groupsize:  6 


Nr.3  V  =  36 
P7  =  12  :  2,0,0,0,5 
P3  =  8  :  0,0,0, 0,3 
Groupsize:  24 


Nr.4  V  =  14 
Pj  =  2  :  0,7,0,0,0 
P4  =  7  :  0,2,0,0,2 
Groupsize:  28 


Nr.5  V  =  44 
P7  =  12  :  0,3,0,0,4 
P4  =  12  :  0,1,0,0,3 
Groupsize:  24 


Nr. 6  V  =  44 
Pt  =  12  :  0,3,0,0,4 
P4  =  12  :  0,1,0,0,3 
Groupsize:  6 


Nr.7  V  =  44 
Pr  =  12  :  0,2,0,0,5 

P4  =  12  :  0,2,0, 0,2 

Groupsize:  12 


Nr.8  V  =  28 
p-  =  2:  0,0, 7, 0,0 
P5  =  14  :  0,0, 4, 0,1 
Groupsize:  28 


Nr. 9  V  =  44 

P7  =  6  :  0,0, 6,0,1 

P5  =  18  :  0,0,3,0,2 
Groupsize:  12 
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Nr.lO  V  =  20 
Pr  =  4  :  2, 0,3, 0,2 
P5  =  4:  1,0,1,0,3 
P3  =  4  :  0,0, 1,0, 2 
Groupsize:  4 


Nr.  11  V  =  20 
P7  =  4  :  2,0,3,0,2 
p5  =  4  :  1,0,1,0,3 
P3  =  4  :  0,0,1, 0,2 
Groupsize:  4 


Nr.l2  v  =  24 
P7  =  4  :  0,4, 0,1,2 
P6  =  2  :  0,4,0,0,2 
P4  =  8  :  0,1,0, 1,2 
Groupsize;  8 


Nr.l3  t;  =  24 
P7  =  6  :  1,2,0,0,4 

P4  =  6  :  0,2,0, 0,2 

P3  =  2  :  0,0,0, 0,3 
Groupsize:  4 


Nr.l4  V  =  24 
P7  =  6  :  1,3, 0,0, 3 
P4  =  6  :  0,1, 0,0,3 
P3  =  2  :  0, 0,0,0, 3 
Groupsize:  6 


Nr. 15  V  =  24 
P7  =  6  :  2,0,0,1,4 
Pe  =  2  :  3,0,0,0,3 

P3  =  6  :  0,0,0,1,2 

Groupsize:  12 


Nr. 16  V  =  26 
P7  =  6  :  2,0,0,2,3 
Pe  =  3  :  2,0,0,0,4 

P3  =  6  :  0,0,0, 1,2 

Groupsize:  4 


Nr.17  v  =  26 

P7  =  6  :  2,0, 0,2, 3 

Pe  =  3  :  2,0,0,0,4 

P3  =  6  :  0, 0,0,1, 2 

Groupsize:  12 


Nr.18  V  =  26 
P7  =  6  :  2, 0,0,2, 3 
Pe  =  3  :  2, 0,0, 0,4 

P3  =  6  :  0,0,0, 1,2 

Groupsize:  4 


Nr. 19  V  =  26 
P7  =  6  :  2, 0,0, 2,3 
Pe  =  3  :  2,0,0,0,4 

P3  =  6  :  0,0, 0,1, 2 

Groupsize:  12 


Nr.20  V  =  28 
P7  =  4  :  0,2,4,0,1 

P5  =  8  :  0.1,2,0,2 

P4  =  4  :  0.0.2,0,2 
Groupsize:  8 


Nr.21  u  =  28 
p-  =  4  :  0,4,0,3,0 
Pe  =  4  :  0,2,0,1,3 

P4  =  8  :  0,1, 0,1,2 

Groupsize:  8 
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Nr.22  V  =  32 
pr  =  6  :  0,2, 3, 0,2 
P5  =  6  :  0,0, 2,0, 3 

P4  =  6  :  0,2, 0,0, 2 

Groupsize:  12 


Nr.23  V  =  32 
pr  =  6  :  0,2, 1,0, 4 
P5  =  6  :  0,2,2,0,1 
P4  =  6  :  0,0,2,0,2 

Groupsize:  12 


Nr.24  v  =  32 
P;  =  6  :  0,2, 3,0, 2 
P5  =  6  :  0,2,0,0,3 

P4  =  6  :  0,0,2,0,2 

Groupsize:  12 


Nr.25  v  =  32 
P7  -6  :  0,1,3, 0,3 
Po  =  6  :  0,2,0, 0,3 

P4  =  6  :  0,1,2,0,1 

Groupsize:  6 


Nr.26  v  =  32 
P7  =  6  :  0,3,2,0,2 

P5  =  6  :  0,1, 2,0,2 
P4  =  6  :  0,0, 1,0, 3 
Groupsize:  12 


Nr.27  v  =  32 
P7  =  6  :  0,2,3,0,2 
Po  =  6  :  0,1, 1,0,3 

P4  =  6  :  0,1, 1,0,2 

Groupsize:  6 


Nr.28  v  =  32 
P7  =  6  :  0,2,2,0,3 

Po  =  6  :  0,2,1, 0,2 
P4  =  6  :  0,0, 2,0,2 

Groupsize:  12 


Nr.29  =  32 

P7  =  6  :  0,2,2,0,3 
Po  =  6  :  0,2, 1,0,2 

P4  =  6  :  0,0,2,0,2 

Groupsize:  4 


Nr.30  V  =  32  Nr.31  u  =  32  Nr.32  v  =  32  Nr.33  v  =  32 

Pr  =  6:  0,2,2, 0,3  P7  =  6  :  0,2, 2, 0,3  P7  =  6  :  2, 0,0, 3, 2  p- =  6  :  2,0, 0,3, 2 

P5  =  6  :  0,2, 1,0, 2  Po  =  6  :  0,2, 1,0,2  Ps  =  6  :  1,0,0, 2,3  Pe  =  6  :  1,0,0,2,3 

P4  =  6  :  0,0,2,0,2  P4  =  6  :  0,0,2, 0,2  ps  =  6  :  0,0,0.1,2  ps  =  6  :  0, 0,0,1, 2 

Groupsize:  12  Groupsize:  4  Groupsize:  6  Groupsize:  12 
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Nr.34  V  =  32 
Pr  =  6  :  2, 0,0, 3, 2 
P6  =  6  :  1,0,0, 2,3 

P3  =  6  :  0,0,0, 1,2 

Groupsize:  6 


Nr.38  t;  =  36 
Pr  =  6  :  1,0, 4,0,2 
P5  =  12  :  0,0,3, 0,2 
P3  =  2  :  0,0, 0,0,3 
Groupsize:  12 


Nr.35  u  =  32 
Pr  =  6  :  1,0,0, 2,4 

P6  =  6  :  2,0,0, 2,2 
P3  =  6  :  0,0, 0,2,1 

Groupsize:  12 


Nr.39  V-Z8 
P7  =  6  :  0,0,4, 0,3 

Po  =  12  :  0,1, 2,0,2 

P4  =  3  :  0,0, 4, 0,0 
Groupsize:  12 


Nr.36  u  =  32 
P7  =  6  :  1, 0,0,3, 3 
Pe  =  6  :  2,0,0,1,3 

P3  =  6  :  0,0,0,2,1 

Groupsize:  6 


Nr.40  u  =  38 
P7  =  6  :  0,3,0,2,2 
Pe  =  6  :  0,3,0, 1,2 
P4  =  9  :  0,0, 0,2,2 
Groupsize:  12 


Nr.37  V  =  36 
P7  =  4  :  0,2,0, 4,1 
Pe  =  8  :  0,2,0,2,2 
P4  =  8  :  0,1,0,2,1 

Groupsize:  8 


Nr.41  V  =  40 
P7  =  10  :  0,3, 1,0,3 
Po  =  2  :  0,0, 0,0,5 
P4  =  10  :  0,1,0,0,3 
Groupsize:  10 


Nr.42  V  =  40  Nr.43  v  =  40 

P7  =  12  :  1,2, 0,0, 4  P7  =  12  :  1,2,0,0,4 
P4  =  6  :  0,0, 0,0,4  P4  =  6  :  0,0, 0,0, 4 

P3  =  4  :  0,0, 0,0, 3  P3  =  4  :  0,0, 0,0,3 

Groupsize:  6  Groupsize:  24 


Nr.44  u  =  40 
P7  =  12  :  2,0,0,1,4 
Pe  =  2  :  0,0. 0,0, 6 
P3  =  8  :  0,0,0, 0,3 
Groupsize:  8 


Nr.45  V  =  42 
P7  =  2  :  0,0,7,0,0 
Pe  =  7  :  0.0, 4, 2,0 
P5  =  14  :  0,0,2,2,1 
Groupsize:  28 
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Nr. 46  V  —  A2 

Pt  —  2  :  0,0, 0,7,0 
P6  =  14  :  0,2,0, 3,1 
P4  =  7  :  0,0, 0,4,0 
Groupsize:  28 


Nr.50  ?;  =  44 

P7  =  12  :  1, 0,2,0, 4 
P5  =  6  :  1,0, 0,0,4 
P3  =  6  :  0,0,1, 0,2 
Groupsize;  6 


Nr.47  V  =  44 
P7  =  4  :  0,0, 4,2,1 
P6  =  4  :  0,0,4,0,2 
P5  =  16  :  0,0, 3, 1,1 
Groupsize:  8 


Nr.51  z;  =  44 
P7  =  12  :  1,0, 2,0,4 
P5  =  6  :  1,0,0, 0,4 

P3  =  6  :  0,0, 1,0,2 

Groupsize:  2 


Nr.48  V  =  44 
P7  =  6  :  1,0, 0,4, 2 
P6  =  12  :  1,0,0,3,2 
P3  =  6  :  0,0,0,2,1 
Groupsize:  6 


Nr.49  V  =  44 
P7  =  12  :  1,0, 2,0,4 
P5  =  6  :  1,0,0, 0,4 

P3  =  6  ;  0,0,1,0,2 

Groupsize:  6 


Nr.52  u  =  16 
Pr  =  2  :  1, 2,4,0, 0 
P5  =  4  :  1,1,1,0,2 
Pi  =  2:  0,0,2,0,2 

P3  =  2  :  0, 0,2,0, 1 

Groupsize:  4 


Nr.53  r  =  16 


P7  =  2 

P6  =  2 
Ps  =  2 

2, 0,2, 2,1 
2.0, 1,1,2 
2.0.0. 1,2 

P3  =  4 

0,0,1, 1,1 

Groupsize:  4 


Nr.54  V  —  16 
P7  =  3  :  2,2,0,1,2 
Pe  =  1  :  0;3,0,0,3 
P4  =  3  :  1,0, 0,1, 2 
P3  =  3  :  0,1,0,0,2 
Groupsize:  6 


Nr.55  V  =  16 
p-  =  3  :  2, 2,0, 1,2 
Pe  =  1  :  3, 0,0, 0,3 
P4  =  3  :  0,2, 0,0, 2 
P3  =  3  :  0,0,0, 1,2 
Groupsize:  6 


Nr.56  v  =  20 
P7  =  2:  1,2, 0,4,0 
P6  =  4  :  1,2, 0,1,2 
P4  =  4  :  0,1,0, 2,1 
P3  =  2  :  0,0, 0,2,1 
Groupsi2e:  4 


Nr.57  V  =  20 
P7  =  2:  2,0,1,4,0 
P6  =  4  :  1,0,1,2,2 
P5  =  2:  2,0,0,2,1 
P3  =  4  :  0,0,1,1,1 
Groupsize:  4 


Nr.58  t;  =  20 
Pr  =  3  :  1,2,0,2,2 
P6  =  3  :  2, 0,0, 2,2 
p4  =  3  :  0,2, 0,0, 2 
P3  =  3  :  0,0,0,2,1 
Groupsize:  6 


Nr.59  V  =  20 
Pr  =  3  :  1,2,0,2,2 
Pe  =  3  :  1,1,0,2,2 
P4  =  3  :  1,0,0,1,2 
P3  =  3  :  0,1,0, 1,1 
Groupsize:  3 


Nr.60  V  =  24 
Pr  =  3  :  1,0,4,2,0 
Pe  =  2  :  0,0,3, 0,3 

P5  =  6  :  1,0,1,1,2 

P3  =  3  :  0,0,2,0,1 
Groupsize:  6 


Nr. 64  V  =  26 

Pr  =  6  :  1,1, 2,0, 3 
Po  =  3  :  0,1,0,0,4 
P4  =  3  :  1,0, 1,0,2 
P3  =  3  :  0,1,0, 0,2 
Groupsize:  6 


Nr. 61  i;  =  24 

Pr  =  4  :  1,0,3,1,2 
Pe  =  2  :  2,0,2,0,2 
Pe  =  4  :  1,0,0,1,3 
P3  =  4  :  0,0,1, 1,1 
Groupsize:  4 


Nr.65 

00 

II 

P7  =  2  : 

2,0, 1,4,0 

P6  =  8  : 

1,0,1,3,1 

P5  =  2  : 

0,0, 0,4,1 

c., 

II 

0, 0,0,2, 1 

Groupsize:  4 

Nr.62  V  =  24 
Pt  —  4  :  1,0, 3, 1,2 
Pe  =  2  :  2,0,2, 0,2 
Pe  =  4  :  1,0, 0,1, 3 
P3  =  4  :  0,0,1, 1,1 
Groupsize:  4 


Nr.66  r  =  28 
Pr  =  4  :  1,0,2,3,1 
Pe  =  4  :  1.0,2, 0,3 
Pe  =  4  :  1.0,0,2,2 
P3  =  4  :  0,0,1. 1,1 
Groupsize:  4 


Nr.63  V  =  24 
Pr  =  4  :  2,1,0,3,1 
Pe  =  4  :  1,1,0, 1,3 

P4  =  2  :  0, 0,0,2, 2 

P3  =  4  :  0,0,0,1,2 
Groupsize:  4 


Nr.67  V  =  28 
Pr  =  4  :  1,0, 2, 3,1 
Pe  =  4  :  1,0,2, 0,3 
Pe  =  4  :  1,0,0, 2, 2 
P3  =  4  :  0,0,1, 1,1 
Groupsize:  4 
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Nr.68  i:  =  28 
P7  =  4  :  2;0,2,2,1 
P6  =  4  :  L0,2,l,2 
P5  =  4  :  0,0, 1,2, 2 
P3  =  4  :  0,0, 0,1,2 
Groupsize:  8 


Nr. 69 
Pr  =4 
P6  =4 
Po  =4 

P3  =4 


v  =  2B 
1,0,1, 3, 2 
1,0, 2,0,3 
1,0.1, 2,1 
0,0,1,1,1 


Nr.TO 
P7  =  4 
Pe  =  4 
P5  =  4 

P3  =  4 


v  =  28 
1,0, 1,3,2 
1,0, 2,0,3 
1,0, 1,2,1 
0,0, 1,1,1 


Nr.71 
P7  =  4  : 


v  =  28 
1,0, 3,2,1 


Groupsize:  4 


Groupsize:  4 


P6  =  4:  2,0,1,1,2 
P5  =  4  :  0,0, 1,1, 3 
P3  =  4  :  0,0,0,2,1 
Groupsize:  4 


Nr.72  V  =  28 
P7  =  4  :  1,0, 2,3,1 
Pe  =  4  :  1,0, 1,1,3 
P5  =  4  :  1,0,1,1,2 
P3  =  4  :  0,0, 1,1,1 
Groupsize:  2 


Nr.73 
Pi  =  4 
P6  =  4 
P3  =4 
P3  =  4 


v  =  28 
1,0.2, 3,1 
1,0, 1,1, 3 
1,0,1,1,2 
0,0, 1,1,1 


Groupsize:  4 


Nr.74  V  =  28 
Pr  =  4  :  1,0,2,3,1 
Pe  =  4:  1,0, 1,1,3 
P5  =  4  :  1,0, 1,1,2 
P3  =  4  :  0,0, 1,1,1 
Groupsize:  4 


Nr.75  V  =  28 
Pj  =  4  :  1,0,2, 2, 2 
p6  =  4  :  1,0,2,1,2 
Po  =  4  :  1,0,0,2,2 
P3  =  4  :  0,0,1,1,1 
Groupsize:  4 


Nr.76 
Pr  =  4 
Pe  =  4 
Po  =  4 

P3  =  4 


r  =  28 
2, 0.2, 2,1 
0,0,2, 2, 2 
1,0,0, 2,2 
0,0, 1,0, 2 


Groupsize:  8 


Nr.77  V  =  28 
P7  =4:  1,0,2,2,2 
Pe  =  4  :  1,0,2,1,2 
Pe  =  4  :  1,0,0,2,2 
P3  =  4  :  0,0,1,1,1 
Groupsize;  4 


Nr.78  V  =  28 
p-  =  4  :  2,0.2,2,1 
Pe  =  4  :  0,0.2,2,2 
Pe  =  4  :  1,0.0,2,2 
P3  =  4  :  0,0,1,0,2 
Groupsize:  8 


Nr.79  L-  =  30 
P7  =  6  :  1.1,3,0,2 
Pe  =  6  :  0,1, 1,0,3 
P4  =  3  :  0,0, 2, 0,2 
P3  =  2  :  0,0,0, 0,3 
Groupsize:  6 


Nr.80  V  =  30 
P7  =  6  :  0,2, 2, 0,3 

P5  =  6  :  1,0,2, 0,2 

P4  =  3  :  0,0,0,0,4 
P3  =  2  :  0,0,3,0,0 
Groupsize:  12 


Nr.81  V  =  30 

P7  =  6  ;  1,2,0, 2, 2 

Pe  =  3  :  0,2,0,0,4 

P4  =  6:  0,1,0,1,2 

P3  =  2  :  0,0,0, 0,3 
Groupsize:  12 


Nr.82  V  =  32 
P7  =  3  :  0,2,1, 4,0 

P6  =  6  :  0,1, 1,2,2 

Po  =  3  :  0,2, 0,2,1 
P4  =  6  :  0,1,1,1,1 
Groupsize:  6 


Nr.83  u  =  32 
P7  =  3:  1,0,2,4,0 
P6  =  6  :  1,0,2,1,2 
P5  =  6  :  0,0,2,2,1 
P3  =  3  :  0,0, 0,2,1 
Groupsize:  6 


Nr.84  V  =  32 
P7  =  4  :  0,1, 4,0,2 
P6  =  2  :  0,2, 4,0,0 
P5  =  8  :  0,1,1,1,2 
P4  =  4  :  0,0, 2,1,1 
Groupsize:  8 


Nr.85  u  =  32 
P7  =  4:  0,2,4,1,0 
Pe  =  2  :  0,0,4,0,2 
P5  =  8  :  0,1,1, 1,2 
P4  =  4  :  0, 0,2,0, 2 
Groupsize:  8 


Nr.86  u  =  32 
P7  =  4:  1,0, 0,4,2 
Pe  =  8  :  1,1, 0,2, 2 
P4  =  2  :  0,0, 0,4,0 
P3  =  4  :  0,0,0,2,1 
Groupsize:  8 


Nr.87  f  =  32 
P7  =  8  :  1,1,2,0,3 
P5  =  4  :  1,0, 0,0,4 
P4  =  2  :  0,0, 0,0,4 
P3  =  4  :  0,0,1,0,2 
Groupsize:  8 


Nr.  8  8 

(M 

CO 

II 

P7  =  8  : 

1,1,2, 0,3 

Po  =  4  : 

1,0,0,0,4 

P4  =  2  : 

0,0,0, 0,4 

P3  =  4  : 

0,0,1,0,2 

Groupsize:  8 

Nr.89 

(M 

CO 

II 

P?  =  8  : 

1,2,0,1,3 

Pe  =  2  : 

2,0,0,0,4 

P4  =  4  : 

0,0,0,0,4 

P3  =  4  : 

0,0,0, 1,2 

Groupsize:  8 


Nr.90  V  =  32 
P7  =  8  :  1,2, 0,1, 3 
Pe  =  2  :  2,0,0,0,4 
P4  =  4  :  0,0, 0,0, 4 
P3  =  4  :  0,0,0, 1,2 
Groupsize:  8 


Nr.91  V  =  32 
P7  =  8  :  1,1,0,1,4 
Pe  =  2  ;  0,2, 0,0, 4 
P4  =  4  :  1,0, 0,1,2 
P3  =  4  :  0,1, 0,0,2 
Groupsize:  4 
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Nr. 92  V  =  36 
pr  =  4  :  0,1,4, 2,0 
P6  =  4  :  0,2,2,0,2 

Ps  =  8  :  0,0, 2,1, 2 

P4  =  4  ;  0, 1,0,2, 1 
Groupsize:  8 


Nr.93  u  =  36 
P7  =  4  :  0,0,4,1,2 
Pe  =  4  :  0,2, 2, 1,1 

P5  =  8  ;  0,1, 1,1,2 

P4  =  4  :  0,0,2, 2,0 
Groupsize:  8 


Nr.94  t;  =  36 
P7  =  4  :  0,1,4,2,0 
P6  =  4:  0,1, 2,1, 2 

P5  =  8  :  0,1, 1,1,2 

P4  =  4  :  0,0,2,1,1 
Groupsize:  8 


Nr. 95  V  =  36 
PT  =  i:  1,0,2,4,0 

P6  =  8  :  1,0, 1,2,2 

Po  =  4  :  0,0, 1,2,2 
P3  =  4  :  0,0,0,2,1 
Groupsize:  8 


Nr. 96  V  =  36 
P7  =  4  :  1,0,2, 4,0 

P6  =  8  :  1,0,1, 2,2 

P5  =  4  :  0,0,1,2,2 
P3  =  4  :  0,0, 0,2,1 
Groupsize:  4 


Nr. 97  V  =  36 
Pj  =  4  :  0,0, 2, 4,1 
Pe  =  8  :  1,0,1,2,2 
P3=4:  1,0,0,2,2 
P3  =  4  :  0,0,1,2,0 
Groupsize:  8 


Nr.98  t;  =  36 
P7  =  4  :  2, 0,2,2, 1 
Pe  =  8  :  0,0,1,4,1 
P5  =  4  :  1,0,0,2,2 
P3  =  4  :  0,0,1,0,2 
Groupsize:  4 


Nr.99  V  =  36 
P7  =  6  :  0,1, 2,0,4 
Pe  =  2  :  0,3,3, 0,0 

P5  =  6  :  0,2,0,1,2 
P4  =  6  :  0,0,2,1,1 

Groupsize:  12 


Nr.lOO  V  =  36 
pr  =  6  :  0,2,3, 1,1 
Pe  =  2  :  0,0, 3, 0,3 
Pe  =  6  :  0,1,0, 1,3 

P4  =  6  :  0,1, 1,0, 2 

Groupsize:  6 


Nr. 101  V  =  36 
p-  =  6  :  0,2,3,1,1 
Pe  =  2  :  0,3,0,0,3 
Pe  =  6  :  0,1, 1,0,3 
P4  =  6  :  0,0, 1,1,2 
Groupsize:  6 


Nr.  102 

r  =  36 

p-  =  6  : 

1,2, 0,2, 2 

II 

0, 2,0,2, 2 

Pi  =  6  : 

0, 0,0,2, 2 

P3  =  2  : 

0,0,0, 0,3 

Groupsize:  12 


Nr.103 

CO 

CO 

II 

p-  =  6  : 

1.2.0.2.2 

Pe  =  6  : 

0,2, 0,2,2 

P4  =  6  : 

0,0,0, 2,2 

P3  =  2  : 

0,0,0, 0,3 

Groupsize:  12 
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Nr.l04  v  =  3Q 
P7  =  6:  1,1, 0,3, 2 
P6  =  6  :  0,2,0,1,3 

P4  =  6  :  0,1,0,2,1 

P3  =  2  :  0,0, 0,0,3 
Groupsize:  6 


Nr.105  t;  =  36 
P7  =  6  :  1, 0,0,2, 4 
Pe  =  6  :  0,2,0,2,2 
P4  =  6  :  0, 2,0,2, 0 

P3  =  2  :  0,0,0, 0,3 
Groupsize:  4 


Nr.106  y  =  36 
P7  =  6  :  0,2,0,1,4 

P6  =  6  :  1,2, 0,2,1 

P4  =  6  :  0,0,0, 2, 2 

P3  =  2  :  0,0,0,3,0 
Groupsize:  12 


Nr.107  V  =  36 
P7  =  6  :  0,3,0, 2,2 

P6  =  6  :  1, 1,0,2, 2 

P4  =  6  :  0,0,0,1,3 
P3  =  2  :  0,0,0,3,0 
Groupsize:  12 


®  ® 


Nr.108  y  =  36 
P7  =  8  :  1,1,0,2,3 
P6  =  4  ;  0,1,0,1,4 
P4  =  4  ;  1,0,0,1,2 
P3  =  4  :  0,1,0, 0,2 
Groupsize:  4 


Nr.ll2  V  =  38 

P7  =  6  :  0,2, 2, 1,2 

P6  =  3  :  0,2, 2, 0,2 

Po  =  6  :  0,1, 1,1, 2 

P4  =  6  :  0,0, 1,1,2 

Groupsize:  6 


Nr.l09  y  =  36 
P7  =  8  :  1,1, 0,1, 4 
P6  =  4  :  1,2, 0,1,2 
P4  =  4  :  0,0,0,2,2 
P3  =  4  :  0,0,0,1,2 
Groupsize:  4 


Nr.  113  y  =  38 

P7  =  6  :  0,1,2, 2, 2 

P6  =  3  :  0,0,2, 0,4 

Po  =  6  :  0,2,0,1,2 
P4  =  6  :  0,1,2, 0,1 

Groupsize:  12 


Nr.llO  y  =  38 

P7  =  6  :  0,2,2, 1,2 

Pe  =  3  :  0,0, 4,0,2 

Po  =  6  :  0,0,1,2,2 
P4  =  6  :  0,2,0,0,2 

Groupsize:  12 


Nr. 114 

00 

oo 

II 

Pt  =  6  : 

0.2. 2.2.1 

P6  =  3  : 

0,2, 0,0,4 

Po  =  6  : 

0,1, 2,0, 2 

il 

0,0, 1,1, 2 

Groupsize:  12 


Nr.lll  y  =  38 

P7  =  6  :  0,2,1,2,2 

Pe  =  3  :  0,0,0, 2, 4 
Pe  =  6  :  0,2,2, 0,1 

P4  =  6  :  0,0,2,0,2 

Groupsize:  12 


Nr.115  y  =  40 
p-  =  8  :  1,0,3, 1,2 
Pe  =  2  :  2,0,0, 0,4 
Po  =  8  :  0,0, 2, 0,3 
P3  =  4  :  0,0, 0,1,2 
Groupsize:  4 
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Nr.  116  V  =  42 


Nr.117  V  =  44  Nr.118  v  =  44 


Nr.119  t;  =  44 


P7  =  6  :  1,1, 0,3, 2 
P6  =  9  :  0, 2,0,2, 2 
P4  =  6  :  0,0,0,3,1 
P3  =  2  :  0,0,0,0,3 
Groupsize:  6 


=  6  :  0,1,2,2,2 
P6  =  6  :  0,2,2,0,2 
P5  =  6  :  0,1,0,2,2 
P4  =  6  :  0,0, 1,2,1 

Groupsize:  12 


P7  =  6  :  0,2,2, 1,2 
P6  =  6  :  0,1, 2, 2,1 

P5  =  6  :  0,1, 0,2,2 

P4  =  6  :  0,0, 1,1, 2 

Groupsize:  6 


P7  =  6  :  0,0, 2,2, 3 
Pe  =  6  :  0, 2,2,0, 2 
P5=6  :  0,1,0,2,2 
P4  =  6:  0,1, 1,2,0 
Groupsize:  4 


Nr.l20  v  =  44 
P7  =  6  :  0,0, 2, 2, 3 
P6  =  6:  0,2, 2,0,2 
P5  =  6  :  0,1,0,2,2 

P4  =  6  :  0,1, 1,2,0 

Groupsize:  12 


Nr. 121  t;  =  44 
P7  =  6  :  0,1, 2,3,1 
P6  =  6  :  0,0,1,2,3 

P5  =  6  :  0, 2,0,1, 2 
P4  =  6  :  0,1, 2,0,1 

Groupsize:  6 


Nr.l22  v  =  44 
P7  =  6  :  0,1,1,3,2 
P6  =  6  :  0,1, 1,1,3 

P5  =  6  ;  0,2, 1,1,1 
P4  =  6  :  0,0, 2, 1,1 

Groupsize:  6 


Nr.l23  v  =  44 

Pr  =  6  :  0,1,2,2,2 
P6  =  6  :  0,2,1, 1,2 
P5  =  6  :  0,1, 1,1,2 
P4  =  6  :  0,0, 1,2,1 

Groupsize:  6 


Nr.l24  V  =  44 

P7  =  6  :  0,1, 2, 2, 2 
P6  =  6  :  0,2,2, 0,2 
P5  =  6  :  0,0,1,2,2 
P4  =  6  :  0, 1,0,2, 1 

Groupsize:  12 


Nr.l25  V  =  44 
Pt  =  6  :  0,1,0,2,4 

P6  =  6  :  0,2,2,0,2 
P5  =  6  :  0,1, 2,2,0 
P4  =  6  :  0, 0,1,2, 1 

Groupsize;  12 


Nr.  126  V  =  44 
P7  =  6  :  0,1,2, 3,1 
Pe  =  6  :  0,2,1, 0,3 

P5  =  6  :  0,0,2, 1,2 
P4  =  6  :  0,1,0,2,1 

Groupsize:  6 


Nr. 127  v  =  44 
P7  =  6  :  0,2,2,3,0 
Pe  =  6  :  0,1,0, 2, 3 

P5  =  6  ;  0,1,2,0,2 

P4  =  6  :  0,0, 1,1,2 

Groupsize:  12 
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Nr. 128  V  =  44 

Pr  =  6  :  0,2, 1,2, 2 

P6  =  6  :  0,0,3,1,2 
Po  =  6  :  0,0,1,3,1 

P4  =  6  :  0,2,0,0,2 

Groupsize:  6 


Nr.l32  z;  =  44 
P7  =  6  :  0,0, 2,2, 3 
Pe  =  6  :  0,2,2,0,2 

Po  -  6  :  0,1,0,2,2 

P4  =  6  :  0,1,1,2,0 

Groupsize:  12 


Nr.l36  V  =  44 

P7  =  6  :  0,2, 2, 2,1 
P6  =  6  :  0,2, 1,1.2 

Po  =  6  :  0,0,2, 1,2 

P4  =  6  :  0,0,0,2.2 

Groupsize:  12 


Nr.l29  V  =  44 
P7  =  6  :  0,2,2,3,0 
P6  =  6  :  0,1,1,1,3 

P5  =  6  :  0,0,2,1,2 
P4  =  6  :  0,1,0,1,2 

Groupsize:  6 


Nr.l33  i;  =  44 

P7  =  6  :  0,2,2,1,2 

P6  =  6  :  0,0,3, 2,1 
Po  =  6  :  0,0,0,3,2 

P4  =  6  :  0,2,0,0,2 

Groupsize:  12 


Nr.l37  V  =  44 
P7  =  6  :  0,1, 1,3, 2 
P6  =  6  :  0,1,1, 1,3 
P5  =  6  :  0,2,1, 1,1 

P4  =  6  :  0,0, 2,1,1 

Groupsize:  6 


Nr.  130  V  -  44 
P7  =  6  :  0,0,1,3,3 
P6  =  6  :  0,1,2,0,3 

Po  =  6  :  0,2,0,2,1 

P4  =  6  :  0,1, 2,1,0 

Groupsize:  6 


Nr. 134  V  =  AA 
P7  =  6  :  0,2,1,3,1 
P6  =  6  :  0,1,2, 0,3 

Po  =  6  :  0,0,2,2,1 
P4  =  6  :  0,1,0,1,2 

Groupsize:  6 


Nr.l38  V  =  44 
P7  =  6  :  0,1, 1,3, 2 
P6  =  6  :  0,1, 1,1,3 
Po  =  6  :  0,2, 1,1,1 
P4  =  6  :  0,0, 2,1,1 
Groupsize:  2 


Nr. 131  V  =  AA 

P7  =  6  :  0,2,2,1,2 
P6  =  6  :  0,1,2, 2,1 
Po  =  6  :  0,1,0,2,2 
P4  =  6  :  0,0,1,1,2 
Groupsize:  6 


Nr. 135  V  =  44 

P7  =  6  :  0,2,2,2,1 

P6  =  6  :  0, 1,1,2, 2 
Po  =  6  :  0,1,1,1,2 
P4  =  6  :  0,0,1,1,2 
Groupsize:  6 


Nr.l39  V  =  44 
P7  =  6  :  0,1,1, 3,2 
P6  =  6  :  0,1,1, 1,3 
Po  =  6  :  0,2,1, 1,1 
P4  =  6  :  0,0,2,1,1 

Groupsize:  2 


1 


Nr. 140  v  =  44 

pr  =  6  :  0,2, 1,2, 2 
P6  =  6  :  0,1,2, 1,2 
P5  =  6  :  0,1,1, 2,1 
P4  =  6  :  0,0,1, 1,2 

Groupsize:  6 


Nr.l41  V  —  44 
Pr  =  6  :  0,2, 1,2,2 
P6  =  6  :  0,0, 0,4,2 

P5  =  6  :  0,2,2, 0,1 
P4  =  6  :  0,0, 2, 0,2 

Groupsize:  12 


Nr, 142  t;  =  44 

P7  =  6  :  0,2, 2,2,1 
P6  =  6  :  0,0,2,2,2 

Po  =  6  :  0,1,0,2,2 

P4  =  6  :  0,1,1,0,2 

Groupsize:  12 


Nr.l43  u  =  44 
P7  =  6  :  0,2, 2, 3,0 
P6  =  6  :  0,2, 1,0,3 

P5  =  6  :  0,0,2,1,2 
P4  =  6  :  0,0,0,2,2 

Groupsize:  12 


Nr. 144  z;  =  44 
P7  =  6  :  0,1,3,1,2 

P6  =  6  :  0,2, 1,2,1 

P5  =  6  :  0,1,0,1,3 

P4  =  6  :  0,0,1,2,1 

Groupsize:  6 


Nr. 145  V  =  44 
P7  =  6  :  0,1, 3,1,2 

P6  =  6  :  0,2, 1,2,1 

P5  =  6  :  0,1,0,1,3 

P4  =  6  :  0,0, 1,2,1 

Groupsize:  6 


Nr. 146  z;  =  44 
P7  =  6  :  0, 0,2,2, 3 
P6  =  6  :  0,2,2,0,2 
P5  =  6  :  0,1,0, 2, 2 
P4  =  6  :  0,1, 1,2,0 

Groupsize:  4 


Nr.l47  V  =  44 
P7  =  6  :  0,0,3,2,2 

P6  =  6  :  0,2,0,2,2 

P5  =  6  :  0,0,2,0,3 

P4  =  6  :  0,2,0,2,0 

Groupsize:  12 


Nr.l48 

o 

C-l 

II 

P7  =  2  : 

0,2,2, 2,1 

P6  =  2  : 

1,0,2, 1,2 

P5  =  4  : 

1,1,1, 1,1 

II 

0,0, 2,0,2 

P3  =  2  : 

0, 0,2,1, 0 

Groupsize:  4 


Nr. 149  V  =  24 
P7  =  2  :  1,0, 2,4,0 
P6  =  4  :  01,1,2,2 
P5  =  4  :  1,1, 1,1,1 

P4  =  2  :  0.0, 2,2,0 
P3  =  2  :  0,0, 2, 0,1 

Groupsize:  4 


Nr.150  V  =  24 
P7  =  3  :  1,2,2,0,2 

Pe  =  1  :  0,0, 6,0,0 
P5  =  6  :  0,1, 2,1,1 

P4  =  3  :  0,0,2, 0,2 
P3  =  1  :  0,0,0, 0,3 
Groupsize:  6 


Nr. 151 

z,'  =  24 

P7  =  4  : 

1,1,1,2,2 

P6  =  2  : 

0,2,0,0,4 

Po  =  2  : 

1,2,0, 0,2 

P4  =  4  : 

0,1, 1,1,1 

O'! 

11 

CO 

Cl. 

0,0, 1,0,2 

Groupsize:  4 
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Nr.l52  v  =  24 
P7  =  4:  1,2, 1,1, 2 

pe  =  2:  1,2, 1,0,2 

Po~2.:  0,2, 0,1,2 
P4  =  4  :  0,0, 1,1,2 
Pz  =  2:  0,0, 0,1,2 
Groupsize:  4 


Nr.l53  =  28 
pr  =  4:  1,1,3,1,1 
P6  =  2  :  0,1,3, 0,2 

Po  =  6  :  0,0, 2, 1,2 

P4  =  2  :  1,0, 0,1,2 

P3  =  2  :  0,1,0,0,2 
Groupsize:  4 


Nr.l54  u  =  28 
pr  =  4  :  1,2, 1,2,1 
P6  =  4  :  0,2, 1,1, 2 

Po  =  2  :  1,0, 0,2, 2 

P4  =  4  :  0,0,0,2,2 

P3  =  2  :  0,0,1,0,2 

Groupsize:  4 


Nr.l55  V  =  36 

p-  =  6  :  0,1,2,2,2 

Pe=S  :  0,2,0,0,4 

P5  =  6  :  1,0,2,0,2 

P4  =  3  :  0,0,0,2,2 
P3  =  2  :  0,0,3, 0,0 
Groupsize:  12 


Nr.l56  v  =  38 
P7  =  6  :  0,1, 1,2,3 

P6  =  6  :  1,1,1,1,2 

Po  =  3  :  1,0, 0,2,2 
P4  =  5:  0,0, 0,2,2 
P3  =  3  :  0,0, 1,2,0 
Groupsize:  6 


Nr.l57  t;  =  40 
P7  =  8  :  1,0,1, 1,4 
Pe  =  4  :  1,1,2,0,2 
P5  =  4  :  0,1,0,2,2 

P4  =  2  :  0,0, 2,2,0 

P3  =  4  :  0,0, 0,1,2 
Groupsize:  4 


Nr.l58  y  =  42 
P7  =  6  :  0,2,0,2,3 
Pe  =  6  :  0,0,2,2,2 

P5  =  6  :  1,0,2,2,0 

P4  =  3  :  0,0,0,0,4 
P3  =  2  :  0,0, 3, 0,0 
Groupsize:  12 


Nr. 159  y  =  42 
P7  =  6  :  0,1, 3,2,1 
Pe  =  6  :  1, 0,1,2, 2 
Po  =  6  :  0,1, 0,1,3 
P4  =  3  :  0,0,2, 0,2 
P3  =  2  :  0,0,0,3,0 
Groupsize:  6 


Nr. 160  V  =  42 

P7  =  6  :  1,0, 2, 2, 2 

Pe  =  6  :  0, 1,2,1, 2 

Po  =  6  :  0,1,0, 2,2 

P4  =  3  :  0,0,2, 2,0 
P3  =  2  :  0,0, 0,0,3 
Groupsize:  12 
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4  List  3:  Selected  face-regular  simple  polyhedra 
with  b  >8 


Nr.l  v  =  2A 


Nr  .2  v  =  24 


pg  =  2:  3, 0,0, 6, 0,0,0 

P6  =  6  :  2,0,0,2,0,0,2 
Pa  =  6  :  0,0,0, 2, 0,0,1 

Groupsize:  12 


P9  =  2  :  0,3,6, 0,0, 0,0 

P5  =  6  :  0,2,1,0,0,0,2 
P4  =  6  :  0,1,2,0,0,0,1 

Groupsize:  12 


Nr.3  V  =  24 
P9  =  2  ;  3, 0,0,6, 0,0,0 

P6  =  6  :  2,0, 0,2, 0,0, 2 
Pa  =  6  :  0,0,0, 2, 0,0,1 

Groupsize:  12 


Nr.4  ?;  =  20 
P8  =  3  :  2,2,2,0,0,2 
P5  =  3  :  0,1,2, 0,0,2 
P4  =  3  :  1,0, 1,0, 0,2 
Pa  =  3  :  0,1,0,0,0,2 
Groupsize:  6 


Nr.5  V  =  24 
Pa  —  2  :  0,4, 0,4, 0,0 
Pe  =  4  :  0,4,0, 0,0,2 

P4  =  8  :  0,1,0, 2,0,1 

Groupsize:  16 


Nr. 6  V  =  24 
P8  =  2  :  2,2,0,0,4,0 
pr  =  4  :  2, 2, 0,0,1, 2 
P4  =  4  :  0,1,0,0,2,1 
Pa  =  4  :  0,0,0,0,2,1 
Groupsize:  8 


Nr.7 

li 

Ps  =  2  : 

2,1,2,1,2,0 

P7  =  2  : 

2,0,1,2,0,2 

P6  =  2  : 

2, 0,0, 1,2,1 

Po  =  2  : 

0,2, 0,0, 1,2 

P4  =  2  : 

0,1, 2,0,0, 1 

P3  =  4  : 

0,0,0, 1,1,1 

Groupsize:  4 


Nr. 8  V  =  16 

P9  =  1  :  3, 3, 0,3,0, 0,0 
P6  =  3  :  1,2,0,2,0,0,1 
P4  =  3  :  1,0, 0,2, 0,0,1 
Pa  =  3  :  0,1,0,1,0,0,1 
Groupsize:  3 


P9  =  1  :  3,0,3,0,3,0,0 
P7  =  3  :  1,2, 2,1, 0,0,1 
Pe  =  1  :  0,3.0, 0,3, 0,0 
P5  =  3  :  1,1,0,0,2,0,1 
P4  =  3  :  0,0,1,1,2,0,0 
Pa  =  3  :  0, 0,1,0, 1,0,1 
Groupsize:  3 
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5  List  4:  all  9  face-regular  4-valent  polyhedra 
with  6  =  4 


For  the  polyhedra  in  this  list,  the  graph  induced  by  the  4-gons  is  interesting:  in  Nrs 
"6,7,9  the  graphs  are  two  C4,  Cg  and  the  truncated  Octahedron  respectively. 


Nr.l  V  =  6 
P3  =  8:  3 
Groupsize:  48 
Group:  Oh 


Nr.2  =  8 
Pi  =  2:  4,0 
P3  =  8:  2,1 
Groupsize:  16 
Group:  Did 


Nr.3  z;  =  10 
Pi  =  4:  2,2 
P3  =  8:  2,1 
Groupsize:  16 
Group:  Dih 


Nr.4  v  =  12 
Pi  =  6:  4,0 
P3  =  8:  0,3 
Groupsize:  48 
Group:  Oh 


Nr.5  z;  =  14 
Pi  =  8:  1,3 

P3  =  8:  2,1 

Groupsize:  16 
Group:  Dih 


Nr. 6  z;  =  14 

Pi  —  8:  2,2 

P3  =  8:  1,2 

Groupsize:  16 
Group:  Dih 


Nr.7  V  =  14 
P4  =  8:  2,2 
Pz  =  8:  1,2 
Groupsize:  8 
Group:  D2d 


Nr.8  V  =  22 
Pi  =  16:  1,3 
Pz  =  8:  1,2 
Groupsize:  8 
Group:  D2d 


Nr. 9  V  =  30 

Pi  =  24:  1,3 
Pz  =  8:  0,3 
Groupsize:  24 
Group:  O 
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6  List  5:  all  face- regular  4- valent  polyhedra  with 
6  =  5  and  up  to  24  faces 


Among  the  polyhedra  of  Lists  4  and  5  there  are:  the  Octahedron,  three  semi-regular 
ones  (4-,  5-gonal  antiprisms  and  the  Cuboctahedron)  and  three  regular-faced  (Nr. 
3  of  List  4  and  Nrs.4  and  6  of  List  5,  which  are  the  elongated  square  dipyramid,  the 
pentagonal  gyrobicupola  and  the  pentagonal  orthobicupola,  having  number  15,  31 
and  30,  respectively,  in  the  list  of  92  polyhedra  in  [Joh66]). 

Nr. 2  in  list  5  is  the  Octahedron  truncated  and  capped  on  4  vertices  of  an  induced 
C4.  Nr.3  is  the  elongated  antiprism.  Nr. 5  is  the  dual  rhombic  Icosahedron  (2- 
elongated  5-gonal  antiprism). 


Nr.l  u  =  10 
Po  =  2:  5,0,0 
P3  =  10:  2,0,1 
Groupsize:  20 
Group:  Dad 


Nr.2  v  =  22 
Pa  =  8:  2,0,3 

P3  =  16:  2,0,1 

Groupsize:  16 
Group:  D4/1 


Nr.3  u  =  15 
Pa  =  2:  5,0,0 
P4  =  5:  4,0,0 
P3  =  10:  0,2,1 
Groupsize:  20 
Group:  D^h 


Nr.4  V  =  20 
p5  =  2:  0,5,0 
P4  =  10:  3,0,1 
P3  =  10:  0,3,0 
Groupsize:  20 
Group:  D^d 


Vo  =  2:  5,0,0 

P4  =  10:  2,2,0 
P3  =  10:  0,2,1 

Groupsize:  20 
Group:  D^d 


Nr.6  u  =  20 
P5  =  2:  0,5,0 

P4  =  10:  2,1,1 
P3  =  10:  1,2,0 

Groupsize:  20 
Group:  Dsh 
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7  List  6:  all  26  QRj  Fullerenes 

Th.6  pol3'hedra  1,3,6,12,16.18~21,24— 26  of  this  list  ar6  face-regular.  They  are  the 
polyhedra  23-34  of  List  1,  respectively. 


Pe  =  2  :  0,0, 6,0 

Groupsize:  24 


P6  =  3  :  0,0, 6,0 
Groupsize:  12 


Nr.3  V  =  28 
P6  =  4  :  0,0,6,0 
Groupsize:  24 


P6  =  4  :  0, 0,5,1 
Groupsize:  4 


P6  =  5  :  0,0,4',2 
Groupsize:  20 


Nr.6  v  =  32 
Pe  =  6  :  0,0,4,2 
Groupsize:  12 


Nr.7  t;  =  32 
Pe  =  6  :  0,0,5,1 
Groupsize:  6 


P6  =  6  :  0,0,4, 2 
Groupsize:  12 


P6  =  6  :  0,0, 4, 2 

Groupsize:  4 


Nr.lO  u  =  36 
P6  =  8  :  0,0,4, 2 
Groupsize:  8 


Nr.ll  V  =  36 
Pe  =  8  :  0,0,3, 3 
Groupsize:  4 


Nr.l2  V  =  38 
Pe  =  9  :  0,0,4, 2 
Groupsize:  6 


Nr.  13  V  =  40 
Pe  =  10  :  0,0, 2, 4 
Groupsize:  20 


Nr.  14  V  =  40 
Pe  =  10  :  0,0, 4, 2 
Groupsize:  4 


Nr. 15  V  =  40 
Pe  =  10  :  0,0, 4,2 
Groupsize:  20 


Nr. 16  V  =  44 
Pe  =  12  :  0,0, 3, 3 
Groupsize:  12 
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Nr.l7  u  =  48 
P6  =  14  :  0,0, 3, 3 
Groupsize:  6 


Nr.l8  V  =  52 
P6  =  16  :  0, 0,3,3 
Groupsize:  12 


Nr.l9  u  =  56 
P6  =  18  :  0,0, 2, 4 
Groupsize:  24 


Nr.21  u  =  68 
P6  =  24  :  0,0,2, 4 
Groupsize:  24 


Nr  .22  v  =  68 

Pe  =  24  :  0,0, 2, 4 
Groupsize:  12 


Nr.23  v  =  72 
P6  =  26  :  0,0,2,4 
Groupsize:  8 


Nr.20  u  =  60 
P6  =  20  :  0,0,3,3 
Groupsize:  120 


Nr.24  v  =  80 
P6  =  30  :  0,0,2, 4 
Groupsize:  120 


Nr.25  y  =  80  Nr.26  v  =  140 

P6  =  30  :  0,0,2,4  ps  =  60  :  0.0,1,5 
Groupsize:  20  Groupsize:  60 


Face~regular  A;- valent  bifaced  polyhedra 

In  this  section  we  will  study  the  case  where  faces  of  exactly  two  sizes  a  <  b  occur. 

Bifaced  polyhedra  and  similar  concepts  are  well  studied,  e.g.  in  [Mal70],  [GM66], 
(GZ74].  [Go75].  {Go77].[Za80|.  [JT84].  {JT90],  (JJ89].  (Ow84].  [Ow86].  [J095]. 

Clearly,  in  this  case  the  f-graph  from  the  proof  of  Theorem  1  is  the  K2,  so  we 
get  the  equation  p,,  =  Paj0y 

By  using  kv  =  2e  =  apa  +  bpb  and  the  Euler  formula  r  -  e  +  (po  +  Pb)  =  2,  we  get 

^  ^  “  {2k+2a-ak)-{bk-V>-2k)^  ' 

We  will  use  the  following  notation  for  operations  on  polyhedra: 

tetrakis:  the  tetrakis  of  a  polyhedron  is  obtained  by  putting  a  pyramid  on  4-gonal 
faces. 

4>triakon:  the  4-triakon  of  a  polyhedron  is  obtained  by  partitioning  each  triangle 
into  a  ring  of  3  4-gons  by  putting  a  vertex  in  the  middle  and  connecting  it  to 
the  midpoint  of  every  edge  in  the  boundary. 

5-triakon:  a  o-triakon  of  a  polyhedron  is  obtained  by  partitioning  each  hexagon 
into  a  ring  of  3  pentagons  by  putting  a  vertex  in  the  middle  and  connecting 
it  to  the  midpoint  of  every  second  edge  in  the  boundary.  (An  example  of 
two  different  face-regular  bifaced  polyhedra,  coming  both  as  a  5-triakon  of  the 
truncated  Octahedron,  is  given  in  Remark  1  after  the  Theorem  5  below.) 

Theorem  2  For  k  >  3  there  is  only  one  infinite  series  of  face-regular  (a,  b) -polyhedra, 
that  is  the  Antiprisms  APrismi,  for  any  6  >  3. 

Apart  from  this,  all  face-regular  k-valent  {a,  b) -polyhedra  have  {k;a)  =  (4;  3)  and 
b  =  4, 5, 6; 

They  are: 

6  =  4:7  polyhedra,  given  as  Nrs  3-9  in  List  4; 

6  =  5:  the  Icosidodecahedron  and  Nr  2  in  List  5  (the  tetrakis  of  the  Octahedron, 
truncated  on  all  but  two  opposite  vertices); 

6  =  6:  the  tetrakis  of  the  (fully)  truncated  Octahedron. 

Theorem  2  will  follow  from  the  following  4  Lemmata: 

Remark 

Theorem  2  shows  that  the  largest  4- valent  face-regular  (a,  6)-polyhedra  have  30 
vertices  (i.e.  32  faces)  and  a  =  3.  They  have  (pj,^)  €  {(8, 4),  (20, 5),  (24, 6)}  and 
are  Nr  9  in  List  4,  the  Icosidodecahedron,  the  tetrakis  of  the  truncated  Octahedron, 
respectively.  The  largest  3-valent  face-regular  (3, 6)-polyhedron  cilso  has  32  faces.  It 
is  the  fully  truncated  Dodecahedron  with  (p3,6)  =  (20, 10). 

.\s  we  will  see  below,  all  three  largest  3-valent  face-regular  polyhedra  have  140 
vertices.  They  are  unique  largest  (4, 6)-polyhedron  (the  4-triakon  of  the  truncated 
Dodecahedron;  so  p  =  (p^  =  60,pi.5  =  12))  and  both  largest  (5, 6)-polyhedra:  a 
5-triakon  of  the  truncated  Icosahedron;so  p  =  (Po  =  60,pio  =  12))  and  the  fullerene 
Ci4o{l)  (the  truncation  of  the  dual  snub  Dodecahedron  on  all  12  5-valent  vertices; 
so  p  =  (ps  =  12, p6  =  60)). 
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Lemma  1  The  only  possibilities  for  {k;a,b)  are  (5;3,4),  (4;3,6),  (3;3,6),(3;4»6) 
and  (Z;5,b). 

In  all  other  cases  the  denominator  in  (*)  will  be  non-positive  even  for  =  i, 
the  smallest  possible  value. 

Lemma  2  The  case  {k:a.h)  =  (5;  3, 4)  is  not  possible,  so  there  is  no  face-regular 
5-valent  polyhedron. 

Proof. 

The  denominator  in  (*)  is  positive  only  for  (/(a,6),/(6,a))  =  (1,4),  (1,3). 

In  the  first  case  any  4-gon  is  surrounded  by  12  3-gons,  which  implies  neighbouring 
4-gons  in  the  next  layer  -  a  contradiction.  In  the  case  (1,3),  any  pair  of  adjacent 

4- gons  is  surrounded  by  16  3-gons,  so  the  next  layer  contains  a  4-gon  with  2  4-gonal 
neighbours  -  again  a  contradiction. 

□ 

In  the  following  lemma  we  will  exclude  some  of  the  theoretically  possible  param¬ 
eters  for  A;  =  4. 

Lemma  3  All  cases  for  (ifc;  a)  =  (4;  3)  not  being  contained  in  the  following  list  are 
impossible: 

a) :  6  =  4.'  the  8  polyhedra  Nrs  2-9  in  List  4; 

b) :  6  =  5;  the  Icosidodecahedron  with  /(3, 5)  =  3,  /(o, 3)  =  5,  v  =  30; 

c) :  b>  Z:  the  infinite  class  of  antiprisms  APrismt,; 

d) ;  6  >  3.-  (/(3,6),  /(6,3)  =  (1,6-3),  (p3,P6)  =  (86  -  24,8).  v  =  86  -  18; 

e) :  (/(3, 6),  /(6, 3)  =  (1, 6  -  2),  (pa, pO  =  (46  -  8, 4),  v  =  46  -  6. 

Proof.  For  6  >  4  the  denominator  in  (*)  is  positive  only  if 

1)  (/(3, 6),  /(6, 3))  6  {(1, 6  -  3),  (1, 6  -  2),  (1, 6  -  1),  (1, 6)},  or 

2)  6€  {5, 6, 7}  and  (/(3, 6), /(6, 3))  =  (2,6),  or 

3)  6  €  {5,6}  and  (/(3,6),/(6,3))  =  (2,6  -  1),  or 

4)  6  =  5and/(3,6),/(6,3))€{(2,3).(3,4),(3,5)}. 

The  subcase  (1,6  -  1)  in  case  1)  is  not  possible,  because  otherwise  pa  = 

Pb  =  |.  The  subcases  (1,6).  (1,6  -  3),  (1,6  -  2)  of  the  case  1)  are,  respectively,  the 
cases  c,  d  and  e  of  Lemma  3. 

Cases  2  and  3  are  not  possible,  because  we  get  3  3-gons  on  3  consecutive  edges 
of  each  6-gon;  so,  the  3-gonal  neighbour  of  the  3-gon  in  the  middle,  will  be  adjacent 
to  2  3-gons,  a  contradiction. 

The  subcase  (3, 4)  of  4  is  not  possible,  since  the  3-gonal  neighbours  of  2  adjacent 

5- gons  containing  a  vertex  of  the  intersection,  w’ould  share  an  edge. 

In  the  subcase  (2,3)  of  4,  all  3  triangles  neighbouring  a  pentagon  in  a  row 
would  imply  a  triangle  neighbouring  2  other  ones,  so  assume  w’e  have  a  5-gon  and 
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3  neighbouring  3-gons  not  all  in  a  row.  But  then  one  of  the  5-goaal  neighbours  has 
all  3-gonal  neighbours  in  a  row  -  again  a  contradiction. 

The  remaining  subcase  (3. 5)  of  4  is  case  b  of  Lemma  3. 

□ 

Lemma  4  The  cases  d  and  e  nf  Ltinrnn  3  are  realized  only  by  polyhedron  Nr.3  in. 
List  4  ond  the  tetrakis  of  suitably  truncated  Octahedra,  given  in  Theorem  2. 

Proof. 

In  both  cases  all  3-gons  are  organi2ed  in  4-cycles,  surrounded  by  6-gons,  since 
otherwise  we  will  get  APrismi.  In  case  e  of  Lemma  3  the  number  of  edges  between 
two  b-gons  is  =  4.  So,  the  only  possibility  is  6  =  4  and  Nr  3  of  List 

4  is  unique  realization  (it  is  the  dual  of  the  Octahedron,  truncated  on  2  opposite 
vertices).  In  case  d  of  Lemma  3,  the  number  of  (b  —  b)-edges  is  12.  This  implies 
b  6  {4, 5, 6}  and  we  get  the  tetrakis  of  3  suitably  truncated  Octahedra,  the  first  one 
being  Nr  5  of  List  4  (the  elongated  Nr  3  of  the  List). 

□ 

Theorem  3  All  face-regular  cubic  (3,  b)-polyhedra  have  b  <  10. 

They  are  14  special  truncations  of  the  Tetrahedron,  the  Cube  and  the  Dodecahe¬ 
dron: 


•  the  1-  and  4-truncated  Tetrahedron; 

•  2(6  —  ^-truncated  Cubes  (one  for  b  €  {5, 7, 8}  and  two  for  b  =  6); 

•  4(6  —  o) -truncated  Dodecahedra  (one  for  b  =  6,9, 10  and  two  for  b  =  7,8). 

Proof. 

Due  to  the  3-connectedness  of  polyhedra  we  get  /(3,6)  =  3  for  all  cubic  (3,6)- 
polyhedra.  So  each  triangle  is  isolated  and  3p3  <  v  =  p3(2  +  j^)  -  4.  Together 
with  equality  (*)  and  pa  >  0  we  get  /(6, 3)  >6-5  and  f(b,3)  <  min(5,^).  So 
b  <  10. 

Actually  this  is  a  result  by  Malkevitch  {[Mal70])  for  genera!  (that  is;  not  only 
face-regular)  cubic  (3, 6)-polyhedra. 

The  remaining  possibilities  for  6  >  6  are  (6. /(6, 3);u)  6  {(10,5:60),  (9, 4;  52), 
(8, 3;  44).  (7. 2;  36).  (8, 4;  24)  and  (7.3;  20).  The  first  4  cases  are  realized  by  trunca¬ 
tions  of  the  Dodecahedron  giving  only  one  polyhedron  in  the  first  two  cases  and  tw’o 
non-isomorphic  polyhedra  in  the  others.  The  last  2  cases  are  realized  by  trunca¬ 
tions  of  the  Cube  (giving  a  unique  polyhedron  in  every  case).  For  6  <  6  all  wanted 
polyhedra  are  Nrs  4,5  and  10-13  of  List  1. 

Nrs  1-3  of  List  2  are  the  6-truncated  Cube  and  two  8-truncated  Dodecahedra. 
For  6  >  6  there  remain  3  (3,8)-polyhedra,  one  (3,9)-  and  one  (3,10)-polyhedron. 

□ 

Remark 

If  we  do  not  require  3-connectedness  in  Theorem  3,  more  graphs  exist,  e.g.  one 
for  ever}-  6  >  9,  divisible  by  3  with  Pb  =  2,p3  =  y.t’  =  y. 
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Theorem  4  There  is  on/y  one  infinite  family  of  cubic  (4,  b)-polyhedra,  that  is  Prisms 
for  any  6  >  3; 

The  finite  families  are: 

1)  Two  SO-vertex  (4, 7)-polyhedrn.  cominy  as  the  truncation  of  the  dual  Romhicuhoc- 

tnhedron  or  dual  Miller's  solid  (its  tiuist)  on  all  18  4-valent  vertices: 

2)  14  polyhedra.  coming  from  those  of  Theorem  3  by  the  4-triakon  decoration;  they 

have  {b,v)  e  {(15,140),  (13.116),  (11,92),  (9,68),  (7,44),  (12,56),  (10,44), 
(8,32),  (6,20);  (9,28)  (6, 14)}.  There  are  exactly  2  non-isomorphic  polyhedra 
for  the  3rd,  4th  and  8th  case  and  one  for  the  others. 

3)  3  polyhedra  (besides  the  prism)  forb=5  and  6  for  b  =  6  not  covered  by  previous 

cases:  Nrs  7-9  and  17-22  of  List  1; 

4)  2  44-vertex  (4, 7) -polyhedra  Nrs  5,6  of  List  2  (coming  by  suitable  doubling  of  all 

6  isolated  4-9ons  in  Nrs  21,20  of  List  1); 

5)  the  80-vertex  (4,8)-polyhedron  (coming  by  suitable  doubling  of  all  12  isolated  4- 

gons  of  the  truncation  on  all  12  4’Valent  vertices  of  the  dual  of  the  unique  18- 
vertex  4-valent  (3,4 )polyhedron  given  below,  the  3-gons  of  which  are  organized 
into  2  isolated  ones  and  3  isolated  pairs); 


Proof- 

Possible  values  for  /(4,6)  are  2,3  and  4.  The  case  /(4,6)  =  2  is  possible  only  if 
the  4-gons  form  a  ring  (giving  a  prism)  or  isolated  3-rings  of  4-gons,  which  is  exactly 
case  2  of  Theorem  4. 

If  /(4,6)  =  4,  all  4-gons  are  isolated.  So  Api  <  v.  The  equality  (*)  and  p4  >  0 
imply  2(6  -  6)  <  /(4,6)  <  mm(3.|).  So  6  <  7  and  the  only  possibility  for  6  =  7  is 
/(6, 4)  =  3,P4  =  18,  P7  =  24;  V  =  80.  It  is  exactly  case  1  of  Theorem  4. 

In  the  remaining  case  /(4, 6)  =  3,  the  4-gons  are  organized  into  isolated  adjacent 
pairs.  So,  6^  <  v  and,  using  (*)  and  p4  >  0,  we  get  <  /(6,4)  <  mm(5,f), 
which  implies  6  <  9.  Moreover,  the  only  possible  values  for  (6,/(6,4))  are  (9,5), 
(8,4),  (7,2),  (7,3)  and  (7,4).  The  last  subcase  is  not  possible,  because  it  gives 
P4  =  y.  The  subcase  (7,3)  gives  v  =  44.  A  computer  search  gave  that  it 
is  exactly  case  4  of  the  theorem.  The  remaining  subcases  leave  3  possibilities: 
(6,  /(6, 4); P4,P!,;  v)  e  {(7, 2;  24, 36: 116),  (8, 4;  24, 18;  80),  (9, 5;  60, 36;  188)}.  The  first 
of  them  can  easily  be  removed  by  a  geometric  argument.  For  the  last  one  there  are 
8  vertices  contained  only  in  9-gons.  It  is  easy  to  show  that  only  one  out  of  two 
possible  ways  in  which  the  pairs  of  4-gons  can  neighbour  a  9-gon  containing  such  a 
vertex  can  occur.  Using  this,  geometric  arguments  give  a  contradiction  when  trving 
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to  construct  the  polyhedron.  The  middle  one  is  realized  by  the  polyhedron  of  case 
-5  of  the  theorem.  The  uniqueness  follows  from  its  construction. 

Clearly,  all  remaining  wanted  polyhedra  have  6  <  6  and  so  they  are  covered  by 
List  1;  it  gives  the  last  entry  of  case  3  in  Theorem  4. 

□ 

Remark 

The  largest  face-regular  cubic  (4,6)-  and  (5, 6)-polyhedra  are  also  (just  like  in 
case  1  of  Theorem  4)  the  dual  tetrakis  or  dual  snub  Cube  and  dual  pentakis  (putting 
pyramids  on  all  5-gonal  faces)  of  the  snub  Dodecahedron. 

Theorem  5  There  is  only  one  infinite  family  of  face-regular  cubic  {5,  b) -polyhedra 
with  b>  6: 

Barrel^  (two  b-gons,  separated  by  two  rows  ofb  5-gons)  for  any  b. 

The  finite  families  are: 

1)  12  (5, 6) -polyhedra:  Nrs  23-34 

2)  a  unique  92-vertex  (5,7)-polyhedron,  organized  into  concentric  3-,  15-  and  12- 

Ting  of  5- gons,  separated  by  6-,  9-  and  3-ring  of  7-gons  (as  given  below); 

3)  a  unique  (5, 7)-polykedron  (Nr  9  in  List  2)  and  3  polyhedra  with  /(5,  b)  = 

2,  also  given  below:  unique  140-vertex  (5,10)-polyhedron,  unique  56-vertex 
(5,8)-polyhedron  and  unique  92-vertex  (5, 8) -polyhedron  with  {ps,Pa]f{8,5))  = 
(36, 12;  6). 


Proof. 

The  case  /(5, 6)  =  1  gives  exactly  Barrek-  If  /(o,  6)  =  5,  then  all  o-gons  are 
isolated;  so  5po  <  v.  So  (*)  and  ps  >  0  imply  5(6  ~  6)  <  /(6,  5)  <  3  giving  6  <  7. 

If  /(5, 6)  =  4,  then  all  5-gons  are  organized  into  isolated  pairs;  so  8^  <  v.  Again 
we  get  4(6  -  6)  <  /(6, 5)  <  3  and  6  <  7. 

The  case  /(5, 6)  =  3  has  5-gons  organized  in  disjoint  rings.  Let  t  denote  the 
number  of  3-rings  among  them,  so  3pa  t  <  v.  The  same  count  as  above,  gives 
3(6  —  6)  <  /(6, 5)  <  mm(5,  So  6  =  7  is  the  only  possibility  for  6  >  6. 

In  the  case  6  =  7  we  have  either  /(7, 5)  =  4  and  t  <  20,  or  /(7, 5)  =  5  and 
t  <  2.  The  first  subcase  gives  p^  =  48,  py  =  36  and  it  should  be  164-vertex 
(5,7)-polyhedron  with  all  48  5-gons  organized  into  isolated  rings.  V.P.Grishukhin 
(private  communication)  established,  case  by  case,  non-existence  in  this  subcase. 
The  second  subcase  is  /(7, 5)  =  5,  i.e.  7-gons  also  should  be  organized  in  isolated 
rings.  We  get  ps  =  30,  pr  =  18;  v  =  92.  So  5-  and  7-gons  should  be  organized  in 
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ccmcentric  alternating  rings  and  only  two  vertices  belong  to  three  Daces  of  the  same 
type.  It  is  easy  to  show  that  case  2  of  the  theorem  is  the  unique  possibility  for  such 
a  polyhedron. 

All  possibilities  with  6  =  6  are  covered  by  List  1  (it  is  case  1  of  the  theorem).  The 
only  remaining  case  is  /(5. 6)  =  2,6  >  6.  Using  (*). 've  get  2(6  —  6)  <  /(6. 5)  <  6. 
Clearly,  p.  =  pa^}  =  =  2p.'_+2P*-^  =  So  all  possi- 

bilities  with  po.sitive  integer  v  are  given  by  /(6,  •>)*=  b  —  i,h<  11  — :  for  0  <  i  <  4.  In 
subcase  /(6, 5)  =  6  the  6-gons  are  all  isolated.  It  is  easy  to  see  that  6  should  be  even 
arid  that  for  6  G  {8. 10}  the  only  possibilities  are  the  1‘40-  and  56-vertex  polyhedra 
in  case  3  of  the  theorem.  In  subcase  /(6, 5)  =6—1  an  attempt  to  construct  the 
structures  easily  gives  the  impossibility  for  6  €  {9, 8}  and  unicity  (the  44-vertex  poly¬ 
hedron  of  case  3  of  the  theorem)  for  6  =  7.  The  impossibility  of  cases  (6,  /(6, 5);  t;)  = 
(7, 5;  52),  (7, 4;  68)  was  checked  with  the  help  of  a  computer.  The  remaining  4 
cases  should  be  (5, 6)-polyhedra  with  /(5,6)  =  2,  having  (6, /(6, 5);p5,p6;  v)  = 
(7, 3;  36, 24;  116),  (8, 5;  60, 24;  162),  (8, 6;  36, 12;  92),  (9. 7;  84, 24;  212).  The  third  pos¬ 
sibility  is  realized,  uniquely,  by  the  92-vertex  polyhedron  of  the  case  3  in  the  theorem. 
It  and  the  non-existence  in  other  3  subcases  can  be  checked  by  following  easy  w'ay. 
Let  us  fix  a  6-gon  Aq  =  (1,2,. ..,6)  and,  without  loss  of  generality,  suppose  that 
other  6-gon,  say  Ax,  adjacent  to  Aq  by  edge  (1,2).  In  all  4  subcases  it  is  not  possible, 
that  Aq  was  adjacent  to  a  5-gon  by  the  edge  (2,3)  and  to  a  b-gon  by  the  edge  (3,4), 
because  then  this  5-gon  will  have  3,  not  2  b-gons  as  neighbours.  Now,  consider  the 
situation  when  Ao  is  adjacent  to  5-gons  by  (2,3)  and  by  (3,4),  but  not  by  (4,5).  A  try 
to  construct  a  polyhedron,  respecting  our  conditions  on  /(5,6),/(6,5),  will  continue 
uniquely  and  lead  to  impossibility  always  except  the  subcase  3.  (Most  difficult  is 
the  situation  when  all  6-/(6, 5)  6-neighbours  of  the  original  6-gon  Aq  are  adjacent 
to  it  in  a  row:  by  (1,2),  (2,3)  and  so  on.) 


Remark  1 

The  polyhedron  .\r  24  of  List  1  (i.e.  the  fullerene  F2s{Td))  and  the  3-rd  and  2-nd 
polvhedron  in  case  3  of  Theorem  5  come  by  a  5-triakon  decoration  of  the  (fully) 
truncated  Tetrahedron,  Octahedron  and  Icosahedron,  respectively. 

The  4-th  polyhedron  of  the  case  3  of  Theorem  5  (92-vertex  (5,8)-polyhedron) 
comes  from  face-regular  fullerene  FzeiTd)  (Nr  29  of  List  1)  by  following  decoration 
of  (all  6)  its  hexagons,  having  two  adjacent  5-gons,  being  adjacent  on  opposite 
edges:  put  some  ”H”  with  sides  parallel  to  above  opposite  edges,  so  that  hexagon 
will  be  partitionned  into  4  pentagons.  Above  face-regular  fullerene  comes  itself  by 
a  5-triakon  decoration  (another  one,  than  one,  producing  above  56-vertex  (5,8)- 
polyhedron)  of  a  face-regular  24-vertex  (4,6)-polyhedron:  truncated  Octahedron.  ■ 
Remark  2 

Above  Therems  3,  4  and  5  give  together  the  following  classification: 

Besides  of  two  infinite  families,  Prisms  and  Barrelb.  there  are  exactly  57  3-valent 
face-regular  bifaced  polyhedra.  With  respect  of  the  number  v  of  vertices,  they  have 
the  face-sizes  (a,  6)  as  follows. 

for  i'  =  140  :  (4,15),  (5,6),  (5.10); 
for  V  =  116  :  (4.13); 
foru  =  92:  two  (4.11),  (5,7).  (5,8); 
for  u  =  80  :  two  (4.7),  (4,8).  two  (5,6); 
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for  V  =  68  :  two  (4,9),  (5,6); 

for  V  =  60  ;  (3.10),  (5,6); 

for  v  =  56  :  (4.6).  (4,12),  (5,6),  (5,8); 

for  V  =  52  :  (3.9): 

for  V  =  44  :  fwo  (3.8).  three  (4.7),  (4.10).  (5,6).  (5,7): 
for  r  =  38  :  (5.61: 
for  V  =  36  :  two  (3,7); 

for  V  =  32  ;  (3.81.  two  (4.6).  two  (4,8),  (5,6); 

for  V  =  28:  (3.6).  (4.9).  (5,6); 

for  V  =  24  :  (4,6),  (5,6); 

for  v  =  20:  (3,7),  (4,6); 

for  v  =  16  :  two  (3,6),  (4,5); 

for  v  =  14  :  (4,5),  three  (4,6); 

for  v  =  12  :  (3,5),  (3,6),  (4,5). 
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Abstract 

It  is  shown  quite  generally  that  ground  state  energy  of  two  atoms  in  in¬ 
finite  space,  interacting  via  spherical  potential  which  depends  only  on 
the  distance  between  particles,  is  the  lowest  in  two  dimensions.  Using 
variational  procedure,  binding  energies  of  helium  diatomic  molecules, 
in  infinite  and  restricted  space,  are  obtained  as  well.  The  results  de¬ 
rived  for  helium  atoms  are  in  accordance  with  the  lemma. 


PACS:  36.90.+f,  31.20.Di 


1  Introduction 


Many  physical  phenomena  in  nature  are  related  to  the  behaviour  of  the  small 
number  of  particles.  Among  them,  in  low  temperature  physics  are  supercon¬ 
ductivity,  superfluidity  and  Bose-Einstein  condensation.  Special  interesting 
and  important  cases  are  systems  in  which  particles  are  helium  atoms:  helium 
liquids,  helium  films,  liquid  drops,  atoms  in  cavities  in  solid  matrices  and  in 
nanotubes. 

The  consideration  of  small  systems  begins  with  study  of  two  atoms.  They 
can  be  located  in  both  restricted  and  unrestricted  space:  in  3  dimensions 
(3  D),  2  dimensions  (2  D)  and  1  dimension  (1  D).  Of  course,  real  physical 
world  has  been  occuring  in  finite  3  dimensional  space.  In  making  models 
of  different  physical  situations  we  are  led  to  consider  2  D  and  1  D  space. 
In  such  circumstances  many  physical  effects  are  dominant  in  corresponding 
dimension. 

In  this  paper,  in  Sec.  I,  we  prove  a  general  lemma.  It  relates  ground  state 
energies  of  two  particles  in  1-,  2  and  3  dimensions  in  infinite  space.  It  is 
assumed  that  particles  interact  via  spherical  potential  depending  only  on  the 
distance  between  them.  In  Sec.  II,  using  variational  procedure  and  employing 
the  newest  potential  of  the  interaction  between  helium  atoms  [1],  the  ground 
state  energies  of  helium  molecules  are  obtained.  The  consistency  with  the 
lemma  is  demonstrated. 
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2  Relations  between  ground  state  energies  in 
different  dimensions 

VVe  consider  two  particles  which  interact  via  a  spherically  symmetrical  po¬ 
tential  V{fi,f2),  in  one,  two  and  three  dimensional  space.  The  Hamiltonian 
of  the  system  in  the  relative  coordinates  reads 

H  =  -  +  V(|n-r2|),  (1) 

where  //  =  is  reduced  mass  of  the  particles,  mi  and  m2  are  the  masses 

of  the  particles.  In  the  ground  state  only  the  "radial"  part  of  the  Hamiltonian 
is  important  and  the  operator  A  has  the  form 


Ai 

dr"^  ’ 

in  ID 

(2) 

A2 

52  1  d 

dY  ^  r  dr  ' 

in  2D 

(3) 

A3 

52  2d 

dr'2  r  5r  ’ 

in  3D. 

(4) 

Inequalities  between  energies  in  different  dimensions  may  be  obtained  by 
variational  ansatz 

Hr^^n  r^-^drdn-  . 

where  n  =  1,2,3  denotes  the  dimension  of  physical  space  and  dQ^  =  1, 

dO,^  =  27r  and  dO,^  =  Air. 

As  we  study  the  ground  state  and  having  in  mind  the  symmetry  of  the  system, 
it  is  useful  to  write  the  trial  wave  functions  in  the  form 

^i(r)  =  ^io(?') 

^2(0  =  4=^2o(r) 

Vr 

^3(r)  =  i^3o(r).  (6) 

r 

Introducing  trial  wave  functions  in  the  variational  ansatz  (5)  one  finds 

if  d?  r°°  1 

E.  <  T-^ 

11  [  2/x  Jo  dr^  Jo  \ 

if  roo  /P‘  1  too 

E2  <  T  +  — 4,0}  +  /  dr<ll,V(r)  (8) 

12  [  2{j.  Jo  dr^  4r^  Jo  J 

if  fOO  /ft  /‘OO 

Ez  <  Y  “0“  /  )  (^) 

Iz  [  2ft,  Jo  dr^  Jo 
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where  the  normalization  integrals  read  In  =  /o^  n=l,2,3. 


The  relations  (6),  (7)  and  (S)  are  general.  Assuming  that  'I'l  is  the  eigen¬ 
function  in  one  dimensional  case  and  taking  ^20=^1,  from  (6)  and  (7)  it 
follows 

1  roo  1 

it  means  E2  <  Ei. 


If  it  is  supposed  ^30=^20,  where  ^2  is  the  eigenfunction  in  2  D,  then  from 
(7)  and  (8)  one  finds 


^3 


<  £'2  + 


L—  [ 

I2  2fj.  Jo 


00  1 


(11) 


On  the  other  hand,  if  ^3  is  the  eigenfunction  in  3  D  and  ^20  =  then 


E3>  E2  +  yfl 

Is  ZfJL  Jo 


00  1 


(12) 


The  last  two  inequalities  may  be  joined 
1 


<  £3  <  (13) 


From  the  above  relation  it  follows  E2  <  £3.  In  a  similar  consideration  from 
(6)  and  (8)  follows  E^  —  Ei.  In  this  way  it  is  proved  that  binding  energy  of 
two  interacting  particles  is  the  lowest  in  2  D.  The  result  is  independent  on 
the  statistic  of  the  particles. 


3  Ground  state  energy  of  diatomic  helium  molecules 


In  order  to  describe  physical  systems  that  contain  helium,  many  potentials 
between  atoms  have  been  obtained.  One  of  the  best  is  ab  initio  SAPT  po¬ 
tential  by  Korona  et  al.  [1];  its  enlarged  forms  by  Janzen  and  Aziz  [2]  are 
SAPTl  and  SAPT2  which  comprise  retardation  effects.  Since  the  SAPT  po¬ 
tential  is  so  precise,  it  is  expected  that  the  effect  of  retardation  forces  could 
be  examined  experimentally.  It  reads 

=  eV'{r) 

n-3  ' 


V{r) 

V*{r) 
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(14) 

(15) 


where 


(16) 


/2n(^i^)  1  ^  _2-/  L] 

k=0 

e  =  C6  =  0.03207856 

A  =  20.7436426  cg  =  0.08680214  A® 

A  10 

_ 

Q  =  3.56498393  A~^  Ci2  =  1.57407624  A^^ 

(3  =  -0.22141687  A"^  =  10.31938196  A^"* 

b  =  3.68239497  A"*  Cie  =  86.00126516  A^A 


As  first  let  us  calculate  binding  energy  of  two  helium  atoms  in  infinite  space. 
For  the  ground  state  we  found  that  a  good  analytic  form  of  the  functions  (6) 
in  all  dimensions  is 

=  exp  -  -  sr  ,  (17) 

where  i=l,2,3;  a,  7  and  s  are  variational  parameters  and  of  course  have 
different  minimization  values  in  3D  (ID)  and  2D.  The  same  form  of  pair 
correlations  in  3  D  has  recently  been  used  by  Bruch  [3]  to  examine  the  prop¬ 
erties  of  boson  trimers.  In  2D,  we  use  the  form  employed  in  the  paper  [4] 
and  which  provides  a  slight  improvement  over  a  variational  wave  function 
introduced  in  Ref.  [5].  Binding  energy  and  parameters  are  obtained  in  min¬ 
imization  procedure.  The  results  are  shown  in  Table  I.  In  order  to  estimate 
our  variational  calculation,  and  compare  the  results,  corresponding  numer¬ 
ical  solutions  of  Schroedinger  eq.  are  presented  for  HFD-B3-FCI1  [6]  and 
SAPT  potentials  as  well. 


Now,  as  second,  we  concentrate  on  two  helium  atoms  confined  by  a  hard- 
walled  spherical  potential  in  3D  and  circular  in  2D.  As  it  was  demonsrated 
in  the  paper  [4],  good  variational  wave  functions  of  the  ground  state  are 

^03(7';  d)  =  '^Qz{r)jo{Trrld)  (18) 


in  3  D,  and 

^02(r;  d)  =  ^'o2(r)  Jo(2.404826r/d)  (19) 

in  2D.  d  is  the  diameter  of  the  sphere  and  of  the  circle,  jq  is  the  spher¬ 
ical  Bessel  function  and  Jo  is  the  zeroth-order  Bessel  function.  As  in  in¬ 
finite  space,  the  ground  state  energy  of  the  non-interacting  system  must 
be  subtracted.  The  energy  of  two  free  particles  is  i=2,3,  where  6*2  = 
(2.404826)^/2^  in  2D  and  C3  =  /l^7r^/2/i  in  3D.  The  results  for  d=50 
A  and  d=100  A  are  presented  in  Table  II. 
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4  Discussion 


Let  us  mention  that  only  helium  .4  dimer  in  3  D  has  been  observed  experi¬ 
mentally  [7-9]  up  to  now. 

As  it  is  seen  in  Table  1,  binding  energies  for  all  helium  molecules  are  con¬ 
sistent  with  the  lemma.  Moreover  both  lighter  molecules  are  not  bound  at 
all  in  3  D.  Two  particles  may  be  kept  in  2  D  space  by  an  external  poten¬ 
tial.  It  can  be  realized,  for  example,  in  a  space  between  two  close,  paralel 
big  plates.  Similarly  interior  of  a  long  and  thin  cylinder  may  represent  1  D 
space.  Of  course  these  "confining"  external  potentials  are  not  included  in 
binding  energies  cited  in  Table  1. 

Since  in  restricted  geometry  (in  our  case  sphere  and  cylinder)  external  poten¬ 
tials  are  included  partly,  the  lemma  can  not  be  valid.  Of  course  it  is  correct 
in  this  case  as  well,  if  parameters  of  the  geometry  (for  instance  in  our  case 
diameter  of  sphere  or  cylinder)  are  much  bigir  than  the  effective  range  of  the 
interaction  potential.  Such  behaviour  can  be  recognized  in  Table  2. 

From  the  "exact"  numerical  solution  of  Schroedinger  eq.  [4]  we  know  that 
all  combinations  of  two  helium  atoms  are  bound  in  finite  space  (in  above 
sense);  the  same  is  in  infinite  space  except  two  atoms  of  ^He  and  one  atom 
^He  and  one  atom  ^He  which  are  not  bound  in  3  D.  Let  us  notice  that  our 
trial  function  in  the  case  of  (^He)2  is  not  good  enough  to  reproduce  binding  in 

2  D  in  both  infinite  and  finite  space.  As  comparision  with  numerical  solution 
shows,  it  is  quite  good  for  other  cases. 

It  seems  that  an  interior  of  a  cylinder  is  a  form  which  could  be  the  easyest  to 
realize  in  an  experiment.  Although  we  haven’t  solved  this  problem  theoreti¬ 
cally,  the  main  energetic  characteristics  are  given  by  our  spherical-models  in 

3  D  and  2  D. 

Finally  let  us  mention  that  our  calculation  in  finite  space  is  approximative 
one.  Namely  we  assumed  that  the  center  of  mass  of  two  particles  was  located 
in  the  center  of  space  symmetry.  It  was  shown  in  Ref.  [4]  that  this  approxi¬ 
mation  gives  general  features  of  considered  systems. 
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Table  1:  Binding  energies  in  infinite  space  (in  mK)  of  helium  molecules  in 
2D  and  for  dimer  (‘‘//e)2  in  3D  (second  line),  derived  by  numerical  solving 
Schroedinger  eq.  and  in  variational  procedure;  variational  values  are  in  round 
brackets;  parameters:  a  (in  A),  7  (dimensionless)  and  s  (in  A~^),  are  shown 
for  the  SAPT  potential  only.  Note  that  our  variational  wave  function  is  not 
flexible  enough  to  predict  a  bound  state  of  the  (3He)2  dimer  in  2  D  and  that 
molecules  {^He)2  and  ^He  He  are  not  bound  in  3  D. 


Molecule 

HFD-B3-FCI1“ 

sAPr® 

a 

7 

s 

-39.4  (-37.7) 

-40.7  (-39.93) 

2.758 

4.408 

0.047 

-1.559  (-1.480) 

-1.871  (-1.762) 

2.737 

4.49 

0.012 

{^He)2 

-0.016 

-0.02 

^He  He 

-4.0  (-3.21) 

-4.3  (-3.51) 

2.761 

4.173 

0.011 

“  Ref.  [6] 
*  Ref.  [1] 
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Table  2:  Binding  energies  (in  mK)  of  helium  molecules  in  a  sphere  (3  D,  first 
line)  and  in  a  circle  (2D,  second  line)  derived  in  variational  procedure  for  the 
SAPT  potential;  the  diameter  of  both  confinements  are  d=50  A  and  d  =  100 
A;  parameters:  a  (in  A),  7  (dimensionless)  and  s  (in  A-^),  are  shown  for 
d=50  A. 


Molecule 

50 

100 

a 

7 

s 

-138.713 

-40.650 

2.753 

4.41 

-0.013 

-61.660 

-52.133 

2.767 

4.36 

0.02 

-67.086 

-10.191 

2.782 

3.91 

-0.058 

73.159 

14.827 

2.798 

3.87 

-0.029 

mnrnmmsm 

-94.354 

-19.936 

2.774 

4.10 

-0.042 

■■H 

16.718 

-7.264 

2.794 

4.04 

-0.011 
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Abstract 

Similar  molecular  connectivity  terms  are  capable  to  model  many  different  properties 
of  quite  different  classes  of  compounds,  like  :  alkanes,  amino  acids,  purines,  pyrimidines 
and  inorganic  salts.  Modeled  properties  are  ;  the  pH  at  the  isoelectric  point,  pi,  the  specific 
rotations,  the  solubility,  the  side-chain  molecular  volume,  the  crystal  densities  of  amino 
acids,  the  solubility  of  purines  and  pyrimidines,  the  solubility  of  amino  acids  plus  purines 
and  pyrimidines,  the  lattice  enthalpies  of  metal  halides,  the  unfrozen  water  content  of 
amino  acid.s  plus  metal  chlorides,  and  the  motor  octane  numbers  of  alkanes.  The  internal 
formal  simila.rily  of  these  different  higher-level  molecular  connectivity  descriptors,  which 
are  derived  by  a  trial-and-error  procedure  from  a  medium-sized  set  of  8  molecular 
connectivity  indices  or  a  subset  of  it  is  striking.  Nearly  all  of  them  are  dominant  terms,  that 
is,  they  are  descriptors  which  are  unable  to  further  enhance  the  description  when  they  are 
used  in  combinations  with  other  descriptors. 
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INTRODUCTION 

Recently  the  modeling  of  different  physicochemical  properties  of  molecules  or 
materials  has  allowed  to  discover  quite  interesting  and  powerful  descriptors,  the  molecular 
connectivity  terms,  X  =  which  are  based  on  graph  theoretical  invariants,  the  molecular 
connectivity  indices.  These  are  second-  or  higher-level  descriptors,  derived  by  a  tnal-and-error 
procedure  on  medium  set  of  molecular  connectivity  indices.  It  is,  thus,  possible  to  detect  that 
interesting  relationships  exist  among  many  and  different  molecular  and  material  properties. 
Achieved  descriptions  show  that  physicochemical  properties  have  a  common  root  in  the 
chemical  graph  or  pseudograph  representation  of  a  molecule,  which  are  the  starting  ‘matenal  to 
construct  the  molecular  connectivity  indices,  the  most  widely  used  invariants  in  QSPR/QSAR 
studies  ""  (see  references  therein).  Interesting,  the  form  of  these  terms,  used  to  describe  quite 
different  properties  of  different  classes  of  molecules  is  rather  homogeneous. 

In  this  short  paper  we  will  review  the  different  properties  which  can  be  modeled  by  the 
aid  of  molecular  connectivity  terms,  check  the  advantages,  over  the  normal  molecular 
connectivity  indices,  of  a  modeling  with  such  terms  and  indirectly  stress  how  higher-order 
invariants  derived  from  chemical  graphs  and  pseudographs  can  be  the  common  descriptors  of 
quite  different  properties  of  different  classes  of  compounds. 

Modeled  classes  of  compounds  and  properties  are  :  for  amino  acids  (AA)  :  pH  at  the 
isoelectric  point,  pi,  side-chain  molecular  volume,  V,  specific  rotations,  SRu  (SRd  for  D-AA 
have  just  opposite  values),  ciystal  densities,  CD,  solubility,  S  ;  for  purines  and  pyrimidines  : 
solubility,  S  ;  for  a  mixed  class  of  amino  acids  plus  purines  and  pyrimidines;  solubility, 
S[AA+PP]  ;  for  metal  halides  :  lattice  enthalpies,  AHi®;  for  a  mixed  class  of  amino  acids  plus 
metal  chlorides  :  unfrozen  water  content,  UWC,  and,  finally,  for  alkanes  ;  the  motor,  MON, 
octane  number. 


METHOD 


Molecular  connectivity  terms  are  derived  with  a  trial-and  error  composition  procedure 
performed  on  a  set  of  8  molecular  connectivity  indices  .  {D,  D  ,  ,  Xb  Xi  }  • 


Derivation  of  these  indices  from  the  corresponding  graphs  or  pseudographs  (which  allow 
multiple  connections  and  loops)  is  a  straightforward,  and  has  already  been  explained  elsewhere 
’’  Sometimes  this  medium-sized  set  can  be  restricted  to  a  subset  of  optimal  %  descriptors, 
derived  with  a  combinatorial  technique,  that  is,  a  technique  that  searches  the  entire 
combinatorial  space  spanned  by  the  8  indices,  which  means  255  combinations.  The  choice  of  8 
main  x  indices  alone,  is  done  to  ease  the  combinatorial  problem  both  at  the  level  of  the  choice  of 
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the  best  index  or  indices,  and  at  the  level  of  the  trial-and-error  procedure  to  derive  the  the 
corresponding  terms.  Interesting,  the  molecular  connectivity  terms  can,  sometimes,  further  be 
combined  with  x  indices  to  derive  a  mixed  linear  combination  with  improved  modeling 
capability,  even  if,  normally  molecular  connectivity  terms  are  dominant  descriptor  and  do  not 
allow  to  derive  any  linear  combination  with  improved  modeling  power.  Formally,  the  molecular 
connectivity  terms  have  the  following  form 

^  \a,b,Xi,Xj,Xkai)= - (1) 

Zk+bxi 

Where  a  and  b  are  optimization  parameter,  that  can  be  positive,  negative  or  zero.  Indices  Xi,  Xj, 
Xfcand  Xi  are  indices  of  the  given  molecular  connectivity  set.  Further,  each  index  can  be  elevated 
to  a  positive,  negative  or  zero  power.  Normally,  the  trial-and-error  search  technique  is  quite 
straightforward  and  convergence  is  easily  reached  just  start  with  an  optimal  index,  add  to  it 
the  next  one,  after  optimization  of  this  index,  back-optimize  the  previous  one,  then  construct  the 
fraction  and  act  in  the  same  way,  and,  finally,  a  and  b  coefficients  as  well  as  powers  are  added 
and  optimized. 

Indices  of  set  {x}  are  based  on  the  degree  5i  and  valence  degree,  5i^  (for  the  valence 
molecular  connectivity  indices),  of  each  vertex  i  of  a  molecular  graph  and  pseudograph 
respectively,  where  for  degree  is  meant  the  number  of  connections  incident  to  that  vertex. 
Pseudographs  allow  for  multiple  connections  and  loops,  that  is,  self-comiections,  that  contribute 
twice  to  hi  -  For  components  of  metal  halides  following  definition.  Si'"  =  Zi  !  (Zi  -Z{  -  1)  has 
been  chosen  where  Zi  is  the  number  of  valence  electrons  and  Z  is  the  atomic  number  of  the 
corresponding  atom. 

Aim  of  the  modeling  is  to  describe  a  set  of  properties  by  the  aid  of  a  linear  or  multi-hnear 
relationship,  P  =  SimiXi,  where  X  =  x  represents  a  special  case,  m  ranges  from  0  to  n  and  for  n  = 
0,  we  have,  Xq  =  X^  =1.  Values  of  mj  constants  are  obtained  with  a  linear  least-squares 
procedure.  Normally,  with  X  terms  we  have  i  ==  0,1,  that  is,  a  simple  linear  relationship.  Negative 
meaningless  results  for  modeled  properties  can  be  avoided  using  1  Zim;Xi  | ,  where  bars  stand  for 
absolute  values,  and  use  of  such  an  algorithm  normally  improve  the  description.  If,  instead, 
negative  values  are  experimentally  justified,  sjs  it  is  the  case  for  SRq  and  SR^  of  amino  acids, 
then  bars  are  omitted.  Used  statistics  to  check  the  validity  of  molecular  connectivity  indices  or 
terms  are  ;  the  quality  parameter,  Q  =  r  /  s,  the  F  ratio,  F  =  fr^/[(l-r^)v],  where,  r  =  correlation 
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coefficient,  s  standard  deviation  of  the  estimate,  f  =  degrees  of  freedom,  v  =  number  of  x 
or/and  X  parameters,  the  utility,  u.  =  Ic^l  of  every  descriptor,  inclusive  the  unita^-  X  =  1 
term  of  the  constant  parameter,  as  well  as  the  average  utility  <u>  -  lur/m  of  the  m  desenptors. 


DISCUSSION 

The  experimental  data  as  well  the  molecular  connectivity  indices  of  the  different  classes 
of  compounds  are  collected  in  refs  1-5  as  well  as  m  refs.  7,  and  8. 

Modeling  the  pH  at  the  isoelectric  point,  pi,  of  ammo  acids 

An  appropriate  and  unusual  term  for  this  property  of  21  amino  acids  which  can  be  guessed 

even  before  any  trial-and-error  procedure  is  started,  is  the  term  given  by  eq.  2  .  In  fact, 
considerations  concerning  the  importance  of  the  number  of  functional  basic  and/or  acidic  ^oups 
in  amino  acids  are  critical  for  this  kind  of  property.  In  this  term.  An  =  n,,-nu.  where  nx  -  n”  of 
acidic  groups  (2  for  Asp  and  Glu,  and  1  for  all  othem).  no  -  n«  of  basic  groups  (2  for  Lys  and 
His,  3  for  Arg,  and  1  for  all  others),  and  nr  =  3.  that  is,  the  total  number  of  funcbonal  groups  ; 

notice  that  for  nT=2,  An=0. 


^pl  O^v 


For  X  -  Y  we  obtain  the  following  interesting  modeling  (where  the  rationale  for  notation 

is  ;  for  X  =  D”  ->  ^pi  “  ^  ~ 

(°Xl  ;  Q  =  2.12,  F  =  267,  r=  0.966,  s  =  0.46,  <u>  =  22.4 

Now,  found  term  is  rather  trivial  as  it  is  nothing  else  than  (1+An  /n,),  even  if  the  following 
combination  of  4  terms  with  a  better  Q,  r  and  s  values  is  less  trivial 


{V,  "X,  °X\  'X}p, :  Q  =  2.53,  F  =  95,  r  =  0.980,  s  =  0.39,  <u>  -  7.9 


But  here,  the  good  F  value  of  the  single  descriptor  together  with  its  excellent,  u  (16.3,  28.4), 
utility  vector  are  lost.  In  fact,  the  single  utilities,  with  the  exception  of  u(X“),  this  last 
combination  are  rather  deceiving,  with  u  =  (3.1.  2.8,  4.7,  2,8,  26.3).  Now,  a  deeper  Irial-and- 
error  search  reveals  the  existence  of  the  following  not  at  all  trivial  term 
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(3) 


Cff  -180;^; 

D 


r 


V 


riT  , 


with  an  interesting  improvement  in  modeling  power :  Q  =  3.412,  F  =  693,  r  =  0.987,  s  =  0.29, 
<u>  =  58,  u  =  (26,  90),  C  =  (77.99429,  5.75382).  Not  only  the  improvement  in  F  and  u  is  more 
than  expected,  but,  further,  this  term  is  a  highly  dominant  term  (or  ‘dead-end’  term),  as  it  doesn’t 
allow  any  better  combination  with  any  other  index  or  term. 

Modeling  the  side-chain  molecular  volume,  V,  of  L-amino  acids 

The  description  of  this  property  for  18  amino  acids  offers  some  interesting  clues 
about  use,  form  and  ratings  of  connectivity  terms.  The  best  %  and  LCCI  (linear  combination 
of  X  indices)  are;  :  Q  =  0.25,  F  =  691 ,  r  =  0.989,  s  =  4.0,  <u>  =  15,  and  (D,  x  ■■  X  )  '■ 

Q  =  0.43,  F  =  688,  r  =  0.997,  s  =  2.3,  <u>  =  4.8.  Now,  the  following  term  ’Xv  =  (D'')^  ^  /  V, 
found  by  a  trial-and-error  procedure  is  a  quite  poor  descriptor  with,  Q  =  0.031,  F  =  1 1, 
and  r  =  0.632.  But,  together  with  V  shows  the  following  surprising  modeling  and  utility: 

{V,  ^  Q  =  0-424,  F  =  989,  r  =  0.996,  s  =  2.4,  <u>  =  16,  u  =  (34,  5.3,  7.4).  We  might 
really  wonder  if  somewhere  there  is  not  a  better  term  apt  to  describe  this  property  in  a  more 
satisfactory  way.  A  deeper  search  discovers  the  following  term,  which  show:  Q  =  0.438,  F  =  . 

2109,  r  =  0.996,  s  =  2.3,  <u>  =32,  and  u  =  (46,  17) 


(d'T  +  Czf' 

D'’-0.7-D 


(4) 


and  whose  correlation  vector  is  :  C  =  (18.1 182,  -52.5871).  The  statistical  improvement  of 
this  last  term  is  quite  impressive,  and,  clearly,  it  is  a  dominant  ‘dead-end’  term. 

Modeling  the  specific  SR  rotations  of  L-amino  acids 

An  optimal  term  for  the  specific  rotation  SRl  of  n=16  L-AA  in  aqueous  solution,  can  be 
found  with  a  trial-and-error  composition  procedure  applied  to  the  optimal  restricted  set,  {D,  ”x. 
Xi),  which  rates  ;  Q  =  0.088,  F  =41.2;  r  =  0.955,  s  =  10.8,  <u>  =  7.2.  Found  term,  Xsr, 
.shown  in  eq.  5,  when  a  —  1,  rates  :  Q  =  0.04,  F  =  30;  r  =  0.830,  s  =  19.0,  <u>  =  5.6.  This  term 
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can  be  also  be  used,  with  Cl=  -Cd,  to  describe  the  specific  rotations  SRq  of  the  corresponding 
D-AA,  which  differ  from  SRl  in  their  sign  only. 


{D  +  aXt) 


(5) 


This  quality  of  this  term  becomes  evident  when  it  is  compared  with  the  quality  of  the  single 
best  {Xt}  index,  which  has,  Q  =  0.014,  F  =  3.2,  r  =  0.43,  s  =  30.4,  <u>  =  2.1.  Now,  a 


combinatorial  search  on  the  following  set,  {D,  D  ,  X’  X  >  Xj  X  ^  Xt  ^  ^srCI)}-*  where 
XsR  has  here,  a  =  1,  allows  to  find  the  following  optimal  combination 


{ ‘x,  Xsr(1  )}  :  Q  =  0. 100,  F  =  79.5,  r  =  0.96 1,  s  =  9.6,  <ji>  =  10.6 


A  search  with  Xsr(7)  replacing  Xsr(1)  achieves  a  worse  description,  with  {‘x.  Xt,  Xsr(7)}  : 
Q  =  0.097,  F  =  50,  r  =  .962,  s  =  9.9,  <u>  =  8.1.  Thus,  chosen  modeling  vectors  for  SRl  are: 
X  =  ('x,  XsR,  X°),  Cl=(26.28495,  965.8255,  -545.67  ),  u  =  (7.6,  12.3,  11.8),  where  in  the 

last  vector  we  can  notice  the  good  utility  of  every  term. 

The  fact  that  the  found  family  of  molecular  connectivity  terms  is  not  able  to  derive  a 
good  enough  single  descriptors  oblige  us  to  deepen  the  trial-an-error  search  around  a  term 
composed  of  4  descriptors.  The  search  ends  up  with  the  following  dominant  satisfactory 
term,  which  being  a  dead-end  term  does  not  allow  the  construction  of  any  improved 
multilinear  description 


'z-izD 


0.3 


Z)“*+0.2te) 


0.02 


(6) 


that  shows  the  following  statistics  and  C  vector  :  Q  —  0.084,  F  1 12,  r  0.943,  s  1 1.2, 
<u>  =  10.7,  II  =  (10.6, 10.9),  C  =  (573.1 14,  -430.56). 


The  modeling  of  Crystal  density,  CD,  of Amino  Acids 

The  crystal  densities,  CD,  of  10  amino  acids  cannot  satisfactorily  be  modeled  with  any  x 
index,  in  fact,  the  single  index  description  of  this  property  is  quite  poor 
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{ Y) :  Q  =  3.43,  F  =  3.9,  r  =  0.57,  s  =  0.2,  <u>  =  5.4 


Now,  a  trial-and-error  search  for  a  better  de-scriptor  finds  the  following  molecular  connectivity 
term,  whose  improved  description,  nevertheless,  is  rather  deceiving 

{)  V 

{  'A™  =71-^  )  ;  Q  =  5.86,F=11.2,r  =  0.76,s  =  0.13,<u>  =  5.3  (7) 


The  follovvdng  more  convoluted  term  of  eq.  8  enhances  the  description  consistently  but  always  in 
an  unsatisfactory  way,  with  Q  =  7.91,  F  =  20.4,  r  =  0.848,  s  =  0.1,  <u>  =  5.4,  u  =  (4.5,  6.4),  the 
correlation  vector  being,  C  ==  (-0.50967, 4.81717).  Both  terms  are  dominant  ‘dead-end’  terms. 


(8) 


Modeling  the  solubility  of  amino  acids,  purines  and  pyrimidines  and  of  [AA  +PP] 

We  will  now  model  the  solubility  (g  per  Kg  of  H2O)  of  n  =  20  amino  acids,  the  solubility 
of  23  purines  and  pyrimidines,  and  finally  the  solubility  of  a  mixed  class  of  43  amino  acids 
plus  purines  and  pynmidines.  The  modeling  of  the  solubility  of  the  two  separate  classes  of 
compounds,  AA  or  PP,  has  already  quite  satisfactorily  been  achieved  by  the  aid  of 
supramolecular  reciprocal  ^  and  supramolecular  squared  molecular  connectivity  terms, 
respectively.  Supramolecular  connectivity  indices  are  obtained  multiplying  (Xt  and  Xt*  indices  are 
normally  divided,  as  they  are  total  reciprocal  indices)  the  molecular  connectivity  indices  by  an 
association  parameter,  a,  to  take  into  account  supposed  or  detected  association  phenomena  in 
solution  ”.  To  model  the  entire  class  of  [AA+PP]  compounds  the  introduction  of  the  following 
set  of  supraindices  is  mandatory  * 


{aD.Xt\  a°x-Xt\  aYxt'',  a'x'X.'',  a'x'^-Xt^  Xca^  X^-a' }  (9) 


where  a  =  8  for  Pro,  2  for  Ser,  Hyp,  and  Arg,  and  1  for  the  remnant  amino  acids,  while  for  PP  we 
have  a  =  4  for  7PTp,  2  for  7Etb,  ETp,  and  Cf,  1.5  for  7Itp  and  1  for  the  remnant  PP.  To  notice 
that  the  formulation  of  the  given  set  is  slightly  different  from  the  one  already  given  in  a 
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preceding  paper  *.  The  supraindices  of  this  set ,  to  avoid  to  burden  the  resulting  equations,  will 
be  renamed  into 

('’S,'’S',°S.“S’,'S,'S’'.S,.S,''!  (10) 

We  will  start  looking  for  a  term  to  model  the  20  amino  acids  (no  Cys,  but  wth  Hyp)  using 
the  given  supraindices  as  our  final  aim  is  to  model  the  entire  heterogeneous  class  of  [AA+PP]. 
Dominant  ‘dead-end'  term  of  eq.  1 1  can  achieve  a  quite  satisfactory  modeling,  with,  Q  =  0.020, 
F  =  1007,  r  =  0.99 1 ,  s  =  49.3,  <ii>  =  2 1 ,  u  =  (32,  1 1 ) 


D  c^v  D  nO.3 


Xs(^)  = 


S'"  -  s 


(z, +350- Zj) 


vsO.7 


whose  correlation  vector  is  C  -  (4650.56,  -1 62.2 1 8). 

The  modeling  of  the  23  purines  and  pyrimidines,  can  also  be  achieved  by  the  somewhat 

different  dominant  ‘dead-end’  term  of  eq.  12,  with  Q  =  0.282,  F  =  4005,  r  =  0.997,  s  =  3.5,<u>- 

33,  u  =  (64,3.1),  and  C  =  (11.6437, -1.53752) 


XsiPP)  = 


^3"  +  ("gy-' 

{S,  -  0.002)'  ^ 


(12) 


The  only  deceiving  note  in  this  modeling  is  the  poor  utility  of  the  unitary  term  of  the  correlation 
vector.  Notice  that  we  are  here  modeling  S  in  g  per  Kg  of  H2O,  while  m  our  preceding  modeling 

of  S(PP)  solubility  has  been  modeled  g  in  1 00  ml  H2O 

Well,  let  us  now  model  the  entire  [AA  +  PP]  class  of  compounds,  and  let  us  start  this 
modeling  with  terms,  Xs(AA)  and  Xs(PP).  Their  modeling  power  of  the  whole  heterogeneous 

class  of  compounds  is 

Xs(AA) :  Q  =  0.020,  F  =  1079,  r  =  0.982,  s  =  50.0,  <u>  -  24,  u  =  (33, 15) 

Xs(PP):Q  =  0.010,F  =  297,  r=  0.937,  s  =  91.0,<u>  =  9.9,u  =  (17,2.6) 
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The  term  for  the  solubility  of  amino  acids  is  quite  good,  and  the  term  for  the  solubility  of  PP  is 
not  at  all  deceiving.  In  fact  the  first  term  for  amino  acids  is  very  similar,  in  form  and  modeling 
power,  to  the  following  term,  which  has  been  found  with  a  trial-and-error  procedure  over  the 
entire  class  [AA  +  PP] 


Xs{AA  +  PP)  = 


te  +10' 


(13) 


This  term  rates  :  Q  =  0.021,  F  =  1199,  r  =  0.983,  s  =  47,  <u>  =  23,  u  =  (35,  11),  and  the 
correlation  vector  is  :  C  =  (7958.87,  -100.596).  This  term  is  also  a  dominant  term  even  if  in 
combination  with  index  D  it  shows  a  somewhat  improved  Q  statistics 

{Xs(AA+PP),  D}  ;  Q  =  0.024,  F  =  779,  r  =  0.987,  s  =  42,  <u>  =  17,  u  =  (38, 3.6, 7.8) 

Modeling  the  Lattice  Enthalpy  of  Metal  Halides 

The  optimal  modeling  of  the  lattice  enthalpies  of  20  metal  halides  by  molecular 
coimectivity  indices  and  by  the  reciprocal  molecular  coimectivity  indices, 

seems  rather  satisfactory,  especially  at  the  level  of  two-index  linear 

combination 


{V}: 

Q  =0.015, 

F  =  45, 

r  =  0.846, 

s  =  57.3. 

<u>=17.2 

Q  =  0.033, 

F=115, 

r  =  0.965, 

s  =  29.0. 

<u>=  19.4 

{'Rl: 

Q  =  0.019, 

F  =  72, 

r  =  0.895, 

s  =  48.1. 

<u>  =  29.8 

'  V}  ; 

Q  =  0.038, 

F=147, 

r  =  0.972, 

s  =  25.9. 

<u>  =  24.2 

Now,  a  trial-and-error  procedure  finds  a  term,  shown  in  eq.  14,  based  on  two  types  of 
molecular  coimectivity  indices,  that  shows  the  following  statistics  :  Q  =  0.037,  F  =  281,  r  = 
0.969,  s  =  24.4,  <u>  =  41,  u  =  (17, 65),  with  correlation  vector,  C  =  (191 1.76,  623.102) 


(Z)'’)°'+  0.2 

D"  +  4.2- V” 


(14) 
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This  dominant  term  allows  for  a  somewhat  improved  descnption  with  % 

{Xh,  ‘x'"}:  Q  =0  041,  F  =  173,  r  =  0.976,  s  =  24.0,  <u>  =  15,  u  =  (10.2, 2.2, 32) 

but  the  improvement  in  Q,  does  not  suffice  to  overrule  the  worsening  of  F  and  u,  thus,  X„ 
can  also  be  considered  a  ‘dead-end’  term 

Modeling  the  unfrozen  water  content  U\VC  of  amino  acids  and  inorganic  salts 

The  modeling  of  this  property  for  5  metal  chlorides  and  8  amino  acids  by  normal  x 
indices  is  quite  deceiving  ^  while  the  following  trial-and-error  term  of  eq.  15  is  quite 
satisfactory,  with  Q  =  2.79,  F  =  328,  r  =  0.984,  s  =  0.35,  <u>  =  12.8,  and  u  =  (18,  7.6) 

A  somewhat  better  modeling  can  be  achieved  if  the  absolute  values  of  Xuwc  are 
considered,  that  is,  1  Xuwc  I  •  In  this  case  we  obtain  the  following  interesting  statistics,  Q  = 
3.14,  F  =  417,  r  =  0.987,  s  =  0.3,  <u>  =  12.8,  u  =  (20,  5.0),  with  C  =  (1.83423,  0.55209). 
While  Xuwc  allows  for  an  enhanced  Q/u-combination  with  Xt'"  with  following  values,  Q  = 
3  n,  F  =  203,  <u>  =  9,1  Xuwc  1  term  is  a  strict  ‘dead-end’  term  allowing  no  enhanced  Q 
and  u  combinations. 


Modeling  the  motor  MON  octane  number  of  alkanes 

A  trial-and-error  search  for  a  molecular  connectivity  term  with  the  {D,  °x,  *X,Xi}  set  of 
alkanes,  that  do  not  have  any  valence  molecular  connectivity  indices,  discovers  the  term  of 
eq.l6,  with  the  following  statistics  :  Q  =  0.085,  F  =  146,  r  =  0.916,  s  =  10.8,  <u>  -  19.5,  u  - 
(12.1,27) 


^MON  ~ 


xr 


(16) 
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The  correlation  vector  being,  C  =  (-1.6714,  121.02)  This  term  offers  the  possibility,  through  a 
combinatorial  search,  to  find  a  linear  combination,  with  a  consistently  enhanced  Q,  r  and  s 
statistics,  but  a  worse  F  and  u  statistics 

(Xmon,  D,  ’x}:  Q  =  0. 129,  F  =  85,  r  =  0.965,  s  =  7.9,  <u>  =  7.5,  u  =  (4.8,  5.6,  5.4,  5.4, 4.1) 
CONCLUSION 

Molecular  connectivity  terms  whose  general  type  has  been  described  in  the  method 
section  seem  to  be  the  subtle  thread  that  ties  together  the  many  different  properties  of  different 
classes  of  compounds.  These  terms,  based  on  molecular  connectivity  indices  are  the  last  step  in 
the  search  of  a  Formally  common  descriptor  for  different  properties  of  different  molecular 
structures.  Not  only  the  similarity  among  the  many  different  terms  is  striking,  but  the  fact  that 
normally  a  single  term  is  sufficient  for  a  good  description,  make  us  ccfnfident  that  they  represent 
a  powerful  tool  to  derive  and  infer  physicochemical  information  about  molecules.  The  trial-and- 
error  procedure  to  derive  molecular  connectivity  terms  is  easier  than  the  combinatorial  technique 
used  to  detect  the  optimal  set  of  best  molecular  connectivity  indices  for  a  multilinear  description 
of  a  property.  This  procedure  is  in  some  way  similar  to  a  forward  selection  combinatorial 
technique,  where  only  the  next  best  index  is  chosen,  to  which  a  a  back-optimization  step  has 
been  added.  Further,  this  procedure  allows  to  find  more  than  one  term  that  accomplishes  its  task 
in  an  optimal  way.  For  example,  for  UWC  it  is  possible  to  find  also  the  following  term, 

I  [(0.1('xf  ^+0.6'x'')  I  -  V)]  I ,  which  rates  Q  =  3.12,  F  =  410,  r  =  0.987,  s  =  0.3,  <u>  =  12.8, 
and  u  =  (20,  5.3),  and  the  modeling  of  the  solubility  of  [AA  +  PP],  that  can  be  accomplished 
with  the  same  term  used  to  describe  the  solubility  of  amino  acids  underlines  the  good 
‘adaptability’  of  these  terms.  And,  as  the  search  for  terms  belongs  to  the  more  general  search  for 
new  invariants,  it  is  then  not  at  all  odd  here  to  cite  W.  Ostwald,  who  was  very  aware  about  the 

importance  of  invariants  in  science  :  ‘  The  significance  of  a  law  of  nature  . is  the  finding  of 

an  invariant,  that  is  to  say,  a  quantity  which  remains  unchanged  even  when  all  the  other 
determining  elements  vary  within  the  possible  limits  imposed  by  the  law.  Thus,  we  perceive  that 
the  historical  development  of  scientific  concept  is  ever  associated  with  the  discovering  and 
working  out  of  such  invariants,  in  them  we  behold  the  milestones  which  mark  the  track  traversed 
by  human  knowledge.  ’ 
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Abstract 

If  a  fullerene  is  defined  as  a  finite  trivalent  graph  made  up  solely  of  pentagons  and 
hexagons,  embedding  in  only  four  surfaces  is  possible:  the  sphere,  torus,  Klein  bottle 
and  projective  (elliptic)  plane.  The  usual  spherical  fullerenes  have  12  pentagons,  ellip¬ 
tic  fullerenes  6,  and  toroidal  and  Klein-bottle  fullerenes  none.  Klein-bottle  and  elliptic 
fullerenes  are  the  antipodal  quotients  of  centros5anmetric  toroidal  and  spherical  fullerenes, 
respectively.  Extensions  to  infinite  systems  (plane  fullerenes,  cylindrical  fullerenes  and 
space  fullerenes)  are  indicated.  Eigenvalue  spectra  of  all  four  classes  of  finite  fullerenes 
are  reviewed.  Leapfrog  fullerenes  have  equal  numbers  of  positive  and  negative  eigenvalues, 
with  0,  0,  2  or  4  eigenvalues  zero  for  spherical,  elliptic,  Klein-bottle  and  toroidal  cases, 
respectively. 


Introduction 

The  discovery  of  the  fullerene  molecules  and  related  forms  of  carbon  such  as  nan¬ 
otubes  has  generated  an  explosion  of  activity  in  chemistry,  physics  and  materials  science, 
which  is  amply  documented  elsewhere  [1-4].  In  chemistry,  the  ‘classical’  definition  is  that  a 
fullerene  is  an  all-carbon  molecule  in  which  the  atoms  are  arranged  on  a  pseudo-spherical 
framework  made  up  entirely  of  pentagons  and  hexagons,  which  therefore  necessarily  in¬ 
cludes  exactly  12  pentagonal  rings.  ‘Non-classical’  extensions  to  include  rings  of  other  sizes 
have  been  considered  (e.g.  ref.  [5])  and  may  be  competitive  in  energy  with  the  classical 
fullerenes  in  some  ranges  of  nuclearity  (e.g.  ref.  [6]).  The  present  paper  is  concerned  with 
a  generalisation  in  a  different  direction:  what  fullerenes  are  possible  if  a  fullerene  is  a  finite 
trivalent  map  with  only  5-  and  6-gonal  faces  embedded  in  any  surface  (i.e.  a  2-manifold 
in  the  mathematical  sense)?  This  seemingly  much  larger  concept  leads  to  a  small  number 
of  well  defined  extensions  to  the  class  of  spherical  fullerenes,  actually  three  in  number.  Of 
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these  only  the  toroidal  fullerenes  are  likely  to  have  direct  experimental  relevance  (indeed 
observation  of  a  toroidal  ‘fullerene  crop  circle’  has  already  been  reported  [7])  but  all  three 
extensions  are  useful  in  placing  physical  fullerenes  in  a  wider  mathematical  context  and 
are  considered  in  this  light  here.  A  more  mathematical  treatment  of  the  concept  of  the 
extended  fullerenes  and  their  further  generalisation  to  higher  dimensional  spaces  is  given 
elsewhere  [8]. 

Classification  of  finite  fullerenes 

Define  a  fullerene  in  the  wider  sense  as  a  finite,  trivalent  map  on  a  surface  and 
consisting  of  only  5-gonal  and  6-gonal  faces.  Each  such  object  has  n  vertices,  e  edges  and 
/  faces  of  which  /s  are  pentagons  and  fs  hexagons.  Infinite  analogues  of  fullerenes  will  be 
considered  in  a  later  section. 

The  Euler  characteristic  x  is  defined  as  the  number 

X  =  n-e  +  f  (1) 

which  for  a  trivalent  graph  (hence  having  2e  =  3n)  made  up  of  pentagons  and  hexagons 
(and  hence  2e  =  5/5  +  6/5)  is 

X  =  /s/e.  (2) 

For  a  surface  in  which  a  fullerene  in  this  extended  sense  can  be  embedded,  the  number  x  is 
therefore  a  non-negative  integer.  In  the  well  known  classification  of  compact  2-manifolds, 
any  such  manifold  is  homeomorphic  to  a  sphere  with  g  handles  (if  orientable)  or  to  a  sphere 
with  g  cross-caps  (if  non-orientable)*.  Hence,  the  Euler  characteristic  x  for  a  closed  surface 
(i.e.  a  surface  without  a  boundary)  is  also  given  by 

X  =  2(1  -  y)  (for  an  orientable  surface) 

(3) 

=  2  —  ^  (for  a  non-orientable  surface) . 

The  cases  compatible  with  non-negative  integral  solutions  for  x  are  thus  exactly  four  in 
number.  The  only  surfaces  admitting  finite  fullerene  maps  in  the  sense  of  our  definition  are 
therefore:  S'^  (the  sphere,  orientable  with  g  =  0),T'^  (the  torus,  orientable  with  5  =  1),  K- 
(the  Klein  bottle,  non-orientable  with  g  =  2)  and  (the  real  projective  plane,  also  called 
the  elliptic  plane,  non-orientable  with  5=1).  All  embeddings  are  2-cell,  meaning  that  each 
face  is  homeomorphic  to  an  open  disk.  An  immediate  consequence  of  Euler’s  formula  is 
that  fullerenes  on  5“,  T^,  K~  and  P“  have  exactly  12,  0,  0  and  6  pentagons,  respectively. 

*  Handles  are  made  from  cylinders  and  cross-caps  from  twisted  cylinders  (see  ref.  [9] 
for  details).  ' 
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The  four  possible  classes  of  fullerenes  are  therefore  spherical,  toroidal,  Klein-bottle  and 
elliptic.  No  other  surfaces  are  compatible  with  the  definition.  Toroidal  and  Klein-bottle 
fullerenes  may  also  be  called  toroidal  and  Klein-bottle  polyhexes  [10,11]  as  they  include  no 
pentagons. 

Maps  on  can  be  drawn  as  the  usual  Schlegel  diagrams,  and  maps  on  T^, 
and  by  identifying  opposite  edges  of  a  fundamental  parallelogram  with  appropriate 
orientation.  Maps  on  are  more  usually  drawn  inside  a  circular  frame  where  antipodal 
boundary  points  are  to  be  identified.  Fig.  1  shows  examples  of  small  fullerenes  from  the 
four  classes,  drawn  as  the  graph,  the  map  and  its  dual  triangulation  in  the  appropriate 
surface.  We  remark  that  the  Petersen  and  Heawood  graphs  which  appear  naturally  here 
are  actually  the  5-  and  6-cages  (a  fc-cage  is  a  tri valent  graph  of  smallest  cycle  size  k  with 
the  largest  possible  number  of  edges);  their  duals  in  P^  and  T^,  Kq  and  Kj,  realise  the 
chromatic  number  of  the  corresponding  surfaces. 

Spherical  and  toroidal  fullerenes  have  an  extensive  chemical  literature,  and  Klein- 
bottle  polyhexes  have  been  considered  in  several  papers  [11-13].  The  review  chapter  by 
Klein  and  Zhu  [13]  in  particular,  introduces  many  of  the  relevant  concepts  from  surface 
topology  to  a  chemical  context.  Elliptic  fullerenes  have  appeared  so  far  only  in  ref.  [8], 
but  turn  out  to  be  related  in  a  simple  way  to  a  subclass  of  the  known  spherical  fullerenes, 
as  shown  later. 

Spherical  fullerenes 

It  has  been  proved  that  at  least  one  spherical  fullerene  with  n  vertices  (modelling  a 
carbon  molecule  Cn)  exists  for  all  even  n  with  n  >  20  except  for  the  case  n  =  22  [14]. 
Each  fullerene  polyhedron  has  /s  =  12  pentagons  and  /6  =  n/2  —  10  hexagons.  Chemical 
interest  centres  on  isolated-pentagon  fullerenes,  which  can  be  constructed  for  n  —  60  and 
for  all  even  values  of  n  >  70  (thus  with  /e  =  20  and  /e  >  25).  Aspects  of  the  systematics 
of  spherical  fullerenes  including  chemical  results  are  summarised  in,  e.g.  ref.  [2]. 

Toroidal  and  Klein-bottle  fullerenes 

T^-  and  K^-polyhexes  are  related  to  the  hexagonal  tessellation  of  the  graphite  sheet 
in  a  straightforward  way.  The  underlying  surfaces  are  quotients  of  the  Euclidean  plane 
B?  under  groups  of  isometries  generated  by  two  translations  (for  T^)  or  one  translation 
and  one  glide  reflection  (for  K‘^).  Each  point  of  and  corresponds  to  an  orbit  of 
the  generating  group.  For  completeness,  we  note  that  the  groups  generated  by  a  single 
translation  or  a  single  glide  reflection  respectively  give  as  quotients  the  cylinder  and  the 
twisted  cylinder  (the  Mobius  surface).  Construction  and  enumeration  of  polyhexes  can 
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therefore  be  envisaged  as  a  process  of  cutting  parallelograms  out  of  the  graphite  plane  and 
gluing  their  edges  according  to  the  rules  implied  in  Fig.  1. 

Some  confusion  exists  in  the  mathematical  and  chemical  literature  of  toroidal  poly¬ 
hexes.  Negami  [15],  Altschuler  [16]  and  other  topological  graph  theorists  define  regular 
3-valent  maps  on  the  torus  to  mean  2-cell  embeddings  with  all  faces  hexagonal,  without 
further  qualification.  Errera  [17],  Brahana  [18],  Coxeter  [19]  and  others  working  in  a  group 
theoretical  tradition  use  the  same  term  in  a  more  restricted  sense  of  polyhexes  with  au¬ 
tomorphism  groups  G  of  the  maximal  possible  order  \Aut{G)\,  in  other  words,  those  that 
realise  the  equality  in  the  analogue  of  the  Weinberg  bound  \Aut{G)\  <  4e(G)  (=  6/5  for 
a  polyhex).  These  regular  maps  are  called  fully  symmetric  by  Nakamoto  in  his  thesis 
[20].  All  such  fully  symmetric  graph  embeddings  are:  (on  5^)  the  five  Platonic  polyhedra, 
(on  six  graphs  that  include  the  Petersen  graph  and  its  dual,  (on  K^)  no  graphs  at 
all  [20],  and  (on  T^)  the  polyhexes  that  arise  from  an  analogue  of  the  Goldberg/Coxeter 
construction  of  icosahedral  5^-fullerene  polyhedra  [21,22]. 

Here  we  consider  only  polyhedral  polyhexes,  i.e.  those  without  loops  or  multiple  edges 
and  where  the  intersection  of  any  two  faces  is  either  one  edge  or  is  empty.  The  dual  of  a 
toroidal  polyhedral  polyhex  is  a  triangulation  of  the  torus.  To  illustrate  the  relationship  of 
the  various  definitions,  we  give  the  counts  for  small  cases  in  Table  1,  using  data  extracted 
from  the  papers  of  Negami  [15]  and  Altschuler  [16].  The  tabulations  given  by  Kirby  [10,12] 
include  some  non-polyhedral  cases. 

In  Negami’s  construction,  a  three-parameter  code  [15]  represents  any  toroidal  polyhex 
(or,  equivalently,  any  6-regular  triangulation  of  T^)  as  a  tessellation  of  the  hexagonal 
lattice.  Each  graph  of  this  type  is  denoted  T{p,q,r),  with  integer  parameters  p,  q  and  r 
where  p  is  the  length  of  a  geodesic  cycle  of  edge-sharing  hexagons,  r  is  the  number  of  such 
cycles  and  q  is  an  offset. 

Polyhex  maps  on  are  constructible  for  all  values  /e  >  3,  n  >  6  [23].  At  least 
one  polyhedral  toroidal  polyhex  exists  for  all  even  numbers  of  vertices  n  >  14.  The  unique 
polyhedral  toroidal  fullerene  at  n  =  14  is  a  realisation  of  the  Heawood  graph.  It  has  indices 
(2, 1)  in  the  Goldberg/Coxeter  construction  and  is  the  dual  of  K7,  the  complete  graph  on 
seven  vertices,  which  itself  realises  the  7-colour  map  on  the  torus.  This  map  and  its  dual 
are  shown  in  Fig.  1.  A  different  presentation  of  this  and  the  next  three  toroidal  polyhexes 
obtainable  by  the  Goldberg/Coxeter  construction  are  illustrated  in  Fig.  2. 

A  description  of  Klein-bottle  polyhexes  can  be  developed  along  similar  lines  [20,24]. 
Each  toroidal  graph  T(p,0,r)  can  be  used  to  obtain  two  Klein-bottle  6-regular  triangula- 
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tions  (and  hence,  by  dualisation,  fullerenes),  the  handle  and  cross-cap  types  Kh{p,r)  and 
Kc{p,r),  respectively.  The  torus  is  cut  along  a  geodesic  of  length  p.  Then  the  handle 
construction  amounts  to  identification  of  opposite  sides  of  the  resulting  parallelogram  with 
reversed  direction.  In  the  cross-cap  construction,  the  opposite  sides  are  each  converted  to 
cross  caps,  with  slightly  different  rules  for  odd  and  even  p.  See  also  ref.  [25]  for  pictures 
of  the  two  types  of  Klein  bottle.  Polyhex  maps  on  are  constructible  for  all  values 
/e  >  3,  n  >  6  [23].  The  unique  smallest  polyhedral  Klein-bottle  polyhex  has  18  vertices  (9 
hexagonal  faces)  and  is  the  dual  of  the  tripartite  the  graph,  the  map  and  its  dual 

are  shown  in  Fig.  1. 

It  will  turn  out  to  be  useful  for  calculation  of  spectra  later  that  each  Klein-bottle 
polyhex  graph,  whether  of  handle  or  cross-cap  type,  has  a  double  cover  among  the  cen- 
trosymmetric  toroidal  polyhexes.  In  contrast  with  a  -polyhex,  a  polyhex  on  may  or 
may  not  be  bipartite,  i.e.  spanned  by  two  disjoint  sets  of  vertices,  black  and  white,  such 
that  every  white  vertex  is  surrounded  by  black  and  vice  versa. 

Elliptic  fullerenes 

Torus  and  Klein-bottle  arise  as  quotient  spaces,  as  described  above,  and  this  leads 
to  a  construction  of  the  possible  polyhex  maps.  Within  the  same  framework  the  real 
projective  plane  arises  as  a  quotient  space  of  the  sphere,  the  required  group  being  Ci.  The 
real  projective  plane  (also  known  as  the  elliptic  plane)  is  obtained  by  identifying  antipodal 
points  of  the  spherical  surface;  in  other  words,  it  is  the  antipodal  quotient  of  the  sphere. 
is  the  simplest  compact  non-orientable  surface  in  the  sense  that  it  can  be  obtained  from 
the  sphere  by  adding  just  one  cross-cap. 

Clearly,  this  construction  can  be  carried  over  to  maps:  the  antipodal  quotient  of  a 
centrosymmetric  map  on  the  sphere  has  vertices,  edges  and  faces  obtained  by  identifying 
antipodal  vertices,  edges  and  faces,  thereby  halving  the  number  of  each  type  of  structural 
component.  For  example,  the  antipodal  quotient  of  the  icosahedron  is  Kq,  the  complete 
graph  on  6  vertices,  and  that  of  the  dodecahedron  is  the  Petersen  graph,  famous  as  a 
counterexample  to  many  theorems.  The  Petersen  graph  is  not  planar  but  it  is  called 
projective-planar  in  the  sense  that  it  can  be  embedded  without  edge  crossings  in  the  real 
projective  plane. 

In  this  terminology,  our  definition  of  elliptic  fullerenes  amounts  to  selection  of  poly¬ 
hedral  projective-planar  trivalent  maps  with  only  5-  and  6-gonal  faces.  As  noted  above, 
/g  =  6  for  these  maps.  Thus,  the  Petersen  graph  is  an  elliptic  fullerene  (the  smallest). 
Maps  on  with  /s  =  6  are  constructible  for  /e  =  0  and  for  all  values  /e  >  3  [23].  Not 
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all  of  these  are  polyhedral.  In  general  the  elliptic  fullerenes  are  exactly  the  antipodal  quo¬ 
tients  of  the  centrosymmetric  spherical  fullerenes.  From  any  centrosymmetric  fullerene  it  is 
possible  to  construct  a  unique  elliptic  fullerene  by  identifying  antipodal  vertices,  and  from 
any  elliptic  fullerene  it  is  possible  to  reconstruct  uniquely  the  original  centrosymmetric 
fullerene. 

Thus,  the  problem  of  enumeration  and  construction  of  elliptic  fullerenes  reduces 
simply  to  that  for  centrosymmetric  conventional  spherical  fullerenes.  The  point  symmetry 
groups  that  contain  the  inversion  operation  are  Ci,  Cnh,  (n  even),  Dnh  (n  even),  Dnd  {n 
odd),  Th,  Oh  and  Ih-  A  spherical  fullerene  may  belong  to  one  of  28  point  groups  [2]  of 
which  8  appear  in  the  previous  list:  Q,  C2h,  D2h,  Deh,  Dsd,  Dsd,  Th  and  h.  Clearly, 
a  fullerene  Cn  can  be  centrosymmetric  only  if  n  is  divisible  by  4  as  /e  must  be  even 
but  also  /s  =  n/2  -  10.  After  the  minimal  case  n  =  20,  it  turns  out  that  there  are  no 
centrosymmetric  fullerenes  at  n  =  24  and  n  =  28,  and  unique  examples  at  32  {D^d)  and  36 
[Dqh)-  Complete  enumerations  of  general  centrosymmetric  fullerenes  on  up  to  100  atoms 
and  of  isolated  pentagon  centrosymmetric  fullerenes  of  up  to  140  atoms  taken  from  the 
Fullerene  Atlas  [2]  are  given  in  Tables  2  and  3.  All  generate  elliptic  fullerenes  by  antipodal 
identification. 

It  seems  likely  that  at  least  one  centros3Tnmetric  fullerene  exists  for  all  doubly  even 
values  of  n  >  32  and  at  least  one  centros3munetric  fullerene  with  isolated  pentagons  for 
every  doubly  even  n  >  92,  though  we  are  not  aware  of  a  proof.  For  four  of  the  eight  point 
groups,  explicit  conditions  are  known  for  the  existence  of  a  fullerene  of  the  given  symmetry 
at  a  given  n:  h  fullerenes  Cn  exist  for  all  distinct  solutions  (i,  j)  of  n  =  20(2^  -i-  ij  +  f) 
with  either  i-  j  ox  j  -  0,  and  similar  but  more  complicated  conditions  are  known  for  Th, 
Dsd.  and  Deh  fullerenes  [26]. 

Some  infinite  analogues  of  fullerenes 

All  the  fullerenes  considered  so  far  are  finite  and  are  actually  trivalent  tilings  with 
(combinatorial)  pentagons  and  hexagons  of  particular  surfaces.  If  instead  of  5^,  T^,  or 
P^,  we  consider  tilings  of  the  Euclidean  plane  R‘^,  a  natural  definition  is  obtained  for  an 
infinite  fullerene  analogue.  Namely,  a  plane  fullerene  is  a  trivalent  tiling  of  P?  by  combina¬ 
torial  pentagons  and  hexagons.  Deza  and  Shtogrin  [8]  proved  that  the  number  of  pentagons 
in  a  planar  fullerene  is  at  least  6.  This  follows  from  an  old  result  of  A.D.  Alexandrov  [27]. 
It  is  easy  to  see  that  the  plane  fullerenes  with  /s  =  0  (the  graphite  sheet),  and  /s  =  1  (a 
pentagonal  cone)  are  unique.  However,  there  is  an  infinity  of  possibilities  for  2  <  /s  <  6. 
Restriction  to  bounded  tile  size  eliminates  pathological  possibilities  such  as  an  infinite  tube 


6 


capped  by  a  hemi-dodecahedron.  The  restriction  to  trivalence  is  also  a  powerful  one:  by 
allowing  also  four- valent  vertices,  for  example,  could  be  partitioned  into  pentagons  in 
a  tiling  with  vertices  of  degrees  three  and  four  (Fig.  3)  which  would  not  be  a  fullerene  in 
our  sense. 

Other  infinite  fullerenes  would  be  given  by  trivalent  tilings  with  pentagons  and 
hexagons  of  the  cylinder,  semi-infinite  cylinder,  twisted  cylinder  etc.  Such  a  tiling  on 
the  cylinder  is  a  polyhex  and  is  the  infinite  open  nanotube. 

The  tiling  description  also  leads  naturally  to  definition  of  a  space  fullerene  as  a  four- 
valent  tiling  of  where  each  cell  is  a  conventional  fullerene  polyhedron.  It  turns  out  that 
those  space  fullerenes  where  the  cells  have  no  adjacent  hexagons  [C20J  024,  C26)  C28(Td)]  are 
of  special  interest,  though  others  have  been  constructed  [8].  These  space  fullerenes  occur  in 
chemistry  and  physics  as  the  ‘dodecahedral  family  of  hydrates’  [28]  (clathrates)  and  their 
duals  as  ‘tetrahedrally  close-packed  phases’  (t.c.p)  or  generalised  Prank-Kasper  phases  [29- 
31].  If  the  inventory  of  cells  is  extended  to  the  C22  ‘near  fullerene’  (the  edge-truncated 
dodecahedron  with  1  square,  10  pentagonal  and  two  hexagonal  faces),  other  phases  can 
be  represented,  e.g.  Hume-Rothery’s  phase  7  has  C20  :  C22  :  C26  in  the  proportion  2:2:3 
[31]. 

In  the  hydrate  structures  the  ‘vertices’  are  water  molecules  with  two  donor  and  two 
acceptor  hydrogen  bonds.  Each  space  fullerene  gives  rise  to  a  hypothetical  silicate  structure 
if  every  vertex  is  replaced  by  an  Si04  tetrahedron,  or  a  hypothetical  carbon  allotrope  if 
every  vertex  is  replaced  by  a  single  carbon  atom.  Combinations  tabulated  by  Wells  [28]  can 
be  represented  in  an  obvious  ‘chemical’  fullerene  notation  as  (C2o)(C24)3)  (C2o)2(Td  C28)) 
(020)3(024)2(026)2  and  (020)5(024)8(026)2-  The  first  of  these  is  illustrated  in  Fig.  4.  A 
new  structure  consisting  of  O20-,  O24-  and  C36{Deh)-  cells  is  given  by  Deza  and  Shtogrin  [8]. 
Four-valent  honeycombs  with  fullerene  and  similar  cells  also  figure  in  modern  conjectured 
solutions  to  the  Kelvin  problem  of  finding  a  partition  of  three-dimensional  space  into  cells 
of  equal  volume  and  minimal  surface  area  (see  also  other  papers  in  the  volume  containing 
refs.  [30]  and  [32]).  The  best  foam  found  so  far  [32]  is  the  dual  of  the  A15  Frank-Kasper 
structure  and  is  a  metric  variation  of  (C2o)(C24)3- 

Generalising  further,  the  vertices  could  be  replaced  by  larger  entities  such  as  tetra¬ 
hedral  C28  fullerenes  bonded  through  their  four  apical  atoms  to  make  a  super-fullerene 
lattice.  The  3D  tilings  open  up  a  number  of  questions  of  enumeration,  characterisation 
and  spectral  structure,  to  answer  which  will  require  further  work. 

We  note  that  plane  fullerenes  can  be  seen  as  infinite  fullerene  polyhedra,  and  space 
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fullerenes  as  an  infinite  analogue  of  a  four-dimensional  fullerene.  A  four-dimensional 
fullerene  (a  polytopal  4- fullerene  in  the  language  of  [8])  is  therefore  a  simple  4-polytope 
having  only  five-  and  six-sided  two-faces.  Clearly,  all  cells  of  such  structures  are  fuUerenes. 

Eigenvalue  properties 

A  first  indication  of  the  qualitative  vr-electronic  structure  to  be  expected  of  the  new 
frameworks  as  hypothetical  forms  of  carbon  can  be  gained  from  Hiickel  theory,  for  which 
a  pre-requisite  is  a  knowledge  of  the  adjacency  properties.  For  orientable  surfaces  there  is 
a  clear  link  between  the  spectrum  of  the  adjacency  matrix  of  the  map  and  the  vr-orbital 
energies  of  its  realisation  as  a  carbon  framework.  In  the  simplest  Hiickel  model,  each  eigen¬ 
value  A  of  the  matrix  corresponds  to  an  orbital  of  energy  a  -1-  Xp,  where  a  is  the  Coulomb 
parameter,  assumed  the  same  for  every  site,  and  j3  the  resonance  parameter,  assumed 
equal  for  all  bonds.  For  non-orientable  surfaces  such  as  and  this  correspondence 
is  lost  because  the  intrinsic  twist  in  the  surface  introduces  a  phase  discontinuity  in  the 
TT-basis.  The  relevant  eigenvalues  are  then  those  of  a  weighted  adjacency  matrix,  as  will 
be  discussed  below. 

It  should  be  remembered  that  tt  energy  is  only  one  contribution  to  the  total  energy, 
and  in  real  spherical  fullerenes  it  is  dominated  by  the  strain  in  the  a  system,  which  leads 
for  example  to  the  observation  that  stable  fullerenes  axe  not  necessarily  those  of  maximal 
Hiickel  energy  [2].  As  real  carbon  systems,  -fullerenes  are  highly  strained  and  are  unlikely 
to  be  realised  unless  for  very  large  values  of  n;  the  toroidal  nanotube  reported  by  Liu  et  al. 
[7]  has  a  diameter  of  330  -  500  nm,  implying  many  thousands  of  atoms.  Chemical  systems 
based  on  and  P^.poiyhexes  are  less  plausible,  as  these  systems  involve  self-intersection. 
Klein  [11]  suggests  a  mode  of  interlocking  of  graphitic  planes  that  may  minimise  the  very 
considerable  energetic  costs,  but  the  interest  of  these  systems  is  likely  to  remain  purely 
mathematical  for  a  long  while  to  come. 

(a)  Orientable  fullerenes 

-fullerenes.  The  situation  for  the  usual  spherical  fullerenes  has  been  well  explored. 
Adjacency  matrices  of  spherical  fullerenes  have  typically  more  positive  than  negative  eigen¬ 
values,  correlating  with  their  chemical  behaviour  as  electron-deficient  7r-systems.  Only  oc¬ 
casional  examples  with  more  negative  than  positive  eigenvalues  are  known  [33].  A  special 
subclass  with  equal  numbers  of  positive  and  negative  eigenvalues,  and  therefore  an  ‘ideal’ 
TT-structure,  is  formed  by  the  leapfrog  fullerenes  Cn,  each  constructed  by  omnicapping  and 
then  dualising  a  smaller  5^-fullerene  0^/3  [34].  Other  spherical  fullerenes  with  exactly  n/2 
positive  eigenvalues  are  possible,  but  are  rare  compared  to  the  leapfrogs  [2]. 


8 


Leapfrogging  can  be  carried  out  on  any  surface,  with  characteristic  implications  for 
the  eigenvalue  spectrum.  As  an  illustration  of  the  leapfrog  construction  on  non-spherical 
surfaces,  the  leapfrogs  of  the  smallest  spherical,  toroidal,  Klein-bottle  and  elliptic  fullerenes 
are  given  in  Fig.  5.  The  striking  spectral  regularities  from  the  leapfrog  transformation  can 
be  rationalised  in  terms  of  the  way  that  relationships  between  structural  components  in  a 
parent  carry  over  to  the  leapfrog  map.  Each  face  of  the  parent  gives  rise  to  a  congruent 
but  rotated  face  in  the  leapfrog;  these  Clar  faces  are  disjoint  and  exhaust  the  vertices  of 
the  leapfrog.  All  faces  of  the  leapfrog  outside  the  Clar  set  are  hexagons  centred  on  the 
sites  of  the  parent  vertices.  Each  edge  of  the  parent  gives  rise  to  a  rotated  edge  in  the 
leapfrog;  these  Fries  edges  are  again  disjoint,  and  account  for  one  third  of  the  edges  of  the 
leapfrog  and  all  of  its  vertices.  The  Fries  edges  radiate  from  the  Clar  faces,  so  that  every 
edge  of  the  leapfrog  is  either  Fries  or  Clar  (i.e.  an  edge  of  a  Clar  face) . 

A  consistent  Kekule  structure  can  be  built  for  leapfrog  fullerenes  on  any  of  the  four 
surfaces  by  placing  formal  double  bonds  on  the  Pries  edges  and  formal  single  bonds  on 
the  Clar  edges.  This  Fries  structure  has  the  maximum  possible  number  of  simultaneous 
benzenoid  hexagons,  one  for  each  vertex  of  the  parent,  giving  an  ideal  localised  electronic 
structure  for  the  neutral  carbon  cage  Cn-  A  delocalised  version  of  the  argument  uses 
considerations  based  on  the  Rayleigh  inequality  for  the  distinct  basis  sets  consisting  of 
all  bonding  (in-phase)  or  all  anti-bonding  (out-of-phase)  combinations  along  Pries  edges 
[35]  and  shows  that  5^ -leapfrog  fullerenes  have  no  zeros  and  hence  closed  shells  as  neutral 
molecules. 

A  second  localised  structure  places  a  sextet  of  tt  electrons  on  every  Clar  face  and 
a  single  bond  on  every  Pries  edge;  this  is  a  formal  model  of  the  electronic  structure  of 
the  anionic  system  bearing  an  excess  of  /s  electrons.  This  too  has  its  counterpart 
in  delocalised  molecular-orbital  theory,  where  the  extra  12  electrons  of  a  leapfrog  5^- 
fullerene  anion  occupy  low-lying  anti-bonding  orbitals  of  translational  and  rotational 
symmetry  [36]. 

-fullerenes.  Eigenvalue  spectra  of  toroidal  polyhexes  have  been  studied  in  some 
detail.  Kirby  et  al.  [10]  give  an  explicit  formula  for  calculation  of  the  set  of  eigenvalues  in 
terms  of  canonical  lattice-vector  parameters.  -polyhexes  have  symmetric  spectra  (with 
both  -1-A  and  -A  occurring  for  every  eigenvalue  A),  as  they  axe  bipartite  graphs,  and  all 
eigenvalues  A  except  ±3  and  ±1  have  even  multiplicity.  The  special  eigenvalue  A  =  0 
is  governed  by  a  simple  pattern;  exactly  those  toroidal  polyhedral  polyhexes  that  are 
leapfrogs  have  open-shells,  with  four  zero  eigenvalues  at  positions  n/2  —  1,  n/2,  n/2  +  1 
and  n/2  +  2  in  the  spectrum  [37].  The  spectra  of  toroidal  poly  hexes  are  also  intimately 
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related  to  those  of  spherical  triangle  and  hexagon  polyhedra,  and  explain  endospectral 
regularities  in  the  latter  series  [38]. 

(b)  Non-orientable  fullerenes 

As  warned  earlier,  there  is  a  subtlety  in  the  application  of  Hiickel  theory  to  frame¬ 
works  embedded  in  non-orientable  surfaces.  The  problem  arises  as  follows.  The  7r-basis 
consists  of  a  p  function  on  every  participating  carbon  atom,  directed  along  the  normal  to 
the  surface,  and  Hiickel  energies  are  obtained  by  diagonalising  a  Hamiltonian  matrix  whose 
entries  are  pairwise  integrals  of  these  (vector-like)  functions.  Under  the  usual  assumptions, 
only  functions  on  nearest  neighbours  are  involved.  For  an  orientable  surface,  neglecting 
curvature,  neighbouring  normals  are  parallel  and  the  integrals  are  therefore  proportional 
to  entries  in  the  adjacency  matrix.  However,  for  a  non-orientable  surface  made  by  gluing 
edges  of  a  patch  together  with  a  twist,  p  functions  neighbouring  across  the  join  become 
antiparallel  and  for  such  pairs  the  integral  is  reversed  in  sign  (see  Fig.  6) . 

A  standard  chemical  example  occurs  in  the  theory  of  Mobius  transition  states  for 
pericyclic  processes  where  eigenvalues  for  cycles  with  one  phase  interuption  are  foimd  by 
diagonalising  a  weighted  adjacency  matrix  that  has  an  entry  -1  for  one  link  and  -hi  for  all 
others.  The  spectrum  of  the  Mobius  cycle  can  be  found  by  rotation  of  the  usual  geometric 
construction  for  untwisted  cycles  [39].  In  the  present  case,  an  analogous  procedure  can  be 
adopted:  to  construct  the  dimensionless  Hiickel  Hamiltonian  matrix,  H,  take  the  adjacency 
matrix  A  of  the  graph  and  multiply  by  —1  all  entries  for  edges  that  cross  a  twisted  boundaxy 
(in  the  case  of  or  cross  the  circular  boundary  (in  the  case  of  P^).  Signs  for  any  edges 
terminating  at  or  lying  within  a  boundary  can  be  decided  by  making  small  shifts  to  bring 
their  vertices  inside  the  boundary. 

K'^  -fullerenes.  Some  calculations  of  eigenvalues  of  unweighted  adjacency  matrices 
of  Klein-bottle  polyhexes  have  been  reported  [40]  and  compared  with  those  of  toroidal 
polyhexes,  but  a  general  picture  of  the  spectra  for  these  systems  has  not  been  given. 

The  Klein-bottle  surface  can  be  obtained  by  identifying  diametrically  opposite  points 
of  the  torus,  i.e.  by  collapsing  each  point  and  its  antipode  [41].  The  point  groups  available 
to  the  covering  torus  are  at  most  the  centrosymmetric  subgroups  of  Dooh  i-e.  Dnh  and  Cnh 
{n  even),  Dnd  and  S2n  {n  odd),  Ci,  though,  as  with  spherical  fullerenes,  some  of  the  lower 
groups  may  not  be  realisable.  Hence,  each  A^-polyhex  on  n  vertices  is  doubly  covered  by 
a  centrosymmetric  -polyhex  on  2n  vertices,  and  is  therefore  a  divisor  [42]  of  the  larger 
graph.  By  a  centrosymmetric  graph,  we  mean,  as  usual,  a  graph  that  has  a  centrosymmetric 
setting  on  the  appropriate  surface,  i.e.  has  centrosymmetric  maximal  symmetry.  The 


10 


weighted  -polyhex  represented  by  the  H  matrix  is  the  co-divisor.  Any  given  eigenvector 
of  the  adjacency  matrix  of  the  larger  T^-polyhex  therefore  corresponds  to  an  eigenvalue 
in  the  spectrum  of  A  (H)  of  the  smaller  AT^-graph  if  the  antipodal  vertices  carry  equal 
(opposite)  coefficients.  Eigenvectors  of  the  covering  graph  can  always  be  projected  to 
display  this  gerade/ung evade  symmetry.  Fig.  7  shows  the  construction  of  double  covers 
for  two  small  Klein-bottle  polyhexes.  Given  a  AT^-polyhex,  the  double  cover  can  always  be 
constructed,  reduced  to  canonical  form  and  its  spectrum  partitioned  into  that  of  A  (AT^) 
and  H  (A:2). 

A  simple  pattern  can  be  observed  in  the  partitioned  spectra.  Consider  a  given  eigen¬ 
vector  |A  >  with  eigenvalue  A  in  the  toroidal  double  cover.  As  the  covering  -polyhex  is 
bipartite,  a  change  of  sign  of  the  coefficient  on  every  vertex  of  one  partite  set  generates 
an  eigenvector  |  —  A  >  of  eigenvalue  —A.  On  collapsing  pairs  of  covering  vertices,  |  -I-  A  > 
will  generate  an  eigenvector  of  either  A  or  H,  as  will  |  —  A  >.  Two  different  situations  are 
possible; 

(i)  the  -polyhex  itself  is  bipartite:  j-t-A  >  and  |— A  >  yield  eigenvectors  with  eigenvalues 
belonging  to  one  and  the  same  subspectrum,  either  the  spectrum  of  A  or  the  spectrum 
of  H.  For  example,  in  this  case,  the  A  spectrum  always  contains  A  =  3  and  A  =  —3, 
whereas  the  H  spectrum  contains  neither. 

(ii)  the  -polyhex  itself  is  non-bipartite:  |  -f-  A  >  and  |  —  A  >  3deld  eigenvectors  with 
eigenvalues  belonging  to  different  subspectra,  one  to  the  spectrum  of  A  and  one  to 
that  of  H.  As  a  consequence,  the  two  subspectra  are  exact  reversals  of  one  another. 
In  particular,  the  A  spectrum  always  contains  A  =  -1-3  and  the  H  spectrum  A  =  —3. 

The  origin  of  the  properties  (i)  and  (ii)  is  readily  explained.  Take  the  patch  of 
hexagons  that  generates  the  AT^-polyhex,  and  colour  its  vertices  alternately  black  and 
white.  Further,  take  an  eigenvector  |A  >  of  the  covering  torus  that  yields  a  vector  of  the 
same  eigenvalue  for  A.  |A  >  will  have  the  property  that  any  one  pair  of  antipodal  vertices 
on  the  torus  share  a  coefficient.  Now  attach  to  each  vertex  of  the  patch  the  common 
value  of  the  coefficient  from  its  covering  pair,  to  produce  a  self-consistent  eigenvector  of  A, 
|A(A)  >.  If  the  A’^-polyhex  is  bipartite  (case  (i)),  reversal  of  the  coefficients  of  all  black 
vertices  of  the  patch  gives  another  self-consistent  eigenvector  of  A,  |  —  A(A)  >.  However, 
if  the  -polyhex  is  non-bipartite  (case  (ii)),  then  when  the  patch  is  joined  up  to  make  the 
Klein  bottle  each  edge  of  the  graph  that  crosses  the  twisted  boundary  will  join  vertices  of 
like  colour,  since  it  is  these  edges  that  destroy  the  alternating  pattern  of  the  planar  patch. 
In  such  a  case,  weighting  these  edges  by  —1  and  simultaneously  reversing  all  coefficients 
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on  the  vertices  of  one  colour  gives  an  eigenvector  of  H  with  eigenvalue  -  A,  i.e.  |  -  A(H)  >. 
QED. 

A  leapfrog  rule  can  be  derived  for  AT^.poiyhexes.  Consider  a  patch  cut  from  the 
hexagonal  tessellation  of  the  plane,  such  that  it  can  be  rolled  up  to  give  both  leapfrog  T^- 
and  AT^.poiyj^gxes.  This  possibility  implies  that  the  vertices  of  the  patch  are  spanned  by 
Clar  hexagons.  Fig.  8  shows  such  a  patch.  When  wrapped  as  a  torus,  the  polyhex  must 
have  four  zero-eigenvalue  vectors,  as  it  is  a  leapfrog.  Fig.  8  (c)-(f)  shows  their  explicit 
form.  In  terms  of  the  Fries  edges,  both  bonding  and  anti-bonding  spaces  each  contribute 
two  adjacency  eigenvectors  of  zero  eigenvalue.  Inspection  of  the  figure  shows  that  different 
subsets  of  exactly  two  of  the  four  survive  as  eigenvectors  of  either  A  or  H  matrices  when 
the  same  polyhex  is  glued  as  a  -fuller ene.  The  Rayleigh  inequality  arguments  used  in 
ref.  [35,43]  show  that,  as  bonding  and  anti-bonding  spaces  each  contribute  one  zero  in  the 
Klein-bottle  form,  the  two  zeroes  are  eigenvalues  n/2  and  n/2-1- 1,  i.e.  HOMO  and  LUMO 
of  the  hypothetical  neutral  carbon  framework  with  this  topology.  The  general  chemical 
conclusion  is  that  all  leapfrog  -polyhexes,  whether  treated  as  weighted  or  unweighted 
Hiickel  problems,  have  open  shells,  with  two  electrons  in  two  non-bonding  tt  orbitals. 

A  final  feature  of  the  leapfrog  transformation  is  illustrated  by  Fig.  9,  where  the 
graphs  of  the  18-vertex  AT^-fullerene,  its  leapfrog  and  double  leapfrog  are  superimposed. 
Leapfrogging  switches  the  character  of  the  graph  from  bipartite  to  non-bipartite  and  back 
again.  This  is  a  result  of  a  switch  in  parity  of  geodesic  cycles,  even  though  all  faces  remain 
hexagonal,  and  is  part  of  a  more  general  pattern:  leapfrogging  a  cubic  graph  with  aU  faces 
even  switches  the  bipartite  character  on  the  non-orientable  surfaces  (AT^  and  P^)  but  leaves 
it  unchanged  on  the  orientable  surfaces  (5^  and  T^). 

P'^-fullerenes.  The  eigenvalue  spectra,  both  weighted  and  unweighted,  of  an  elliptic 
fullerene  are  immediately  available  from  the  adjacency  spectrum  of  its  centrosymmetric 
parent.  The  eigenvalues  of  A  for  a  P^-fullerene  are  just  those  of  the  parent  that  correspond 
to  gerade  eigenvectors;  eigenvalues  of  H  correspond  to  ungerade  eigenvectors  of  the  parent. 
Together,  the  weighted  and  unweighted  spectra  of  the  P^ -fullerene  sum  to  the  spectrum 
of  the  parent,  since  the  smaller  graph  is  a  divisor  of  the  larger. 

A  simple  consequence  is  that  any  P^ -fullerene  derived  from  a  leapfrog  spherical  parent 
has  a  properly  closed  shell  as  a  neutral  tt  system.  Proof:  the  leapfrog  parent  Cn  has  n/2 
positive  and  n/2  negative  eigenvalues  [35].  Its  bonding  eigenvectors  span  the  permutation 
representation  of  the  Fries  edges  [34,36].  This  representation  has  character  zero  under 
inversion  as  all  edges  shift  under  this  operation.  Hence,  the  centrosymmetric  leapfrog  5^- 
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fullerene  Cn  has  n/4  gerade  and  n/4  ungerade  bonding  eigenvectors.  Therefore,  the  spectra 
of  A  and  H  of  the  derived  P^-fullerene  will  each  have  n/4  bonding,  n/4  anti-bonding 
and  zero  non-bonding  eigenvectors,  QED.  As  the  operation  that  derives  P^-  from  S^- 
fullerenes  commutes  with  the  leapfrog  transformation,  this  argument  proves  that  leapfrog 
P'^ -fullerenes  have  properly  closed  shells. 

Conclusion 

This  paper  has  explored  the  extension  of  the  fullerene  concept  from  the  sphere  to 
other  surfaces,  retaining  tri valence  and  the  limitation  to  face  sizes  five  and  six,  showing  that 
the  chemical  species  exist  within  a  mathematical  context  of  a  limited  set  of  possibilities 
for  tiling.  Consideration  of  the  extended  set  of  surfaces  also  gives  a  context  to  the  magic- 
number  rules  of  Hiickel  theory,  such  as  the  leapfrog  rule  for  closed-shell  5^-fullefehes,  as 
it  turns  out  that  leapfrog  polyhedra  have  distinct  but  predictable  properties  on  all  four 
surfaces.  Extensions  to  other  face  sizes  and  more  exotic  surfaces  will  allow  description  of 
many  more  variants  on  the  sphere,  some  of  which  are  promising  as  candidates  for  carbon 
polyhedra  or  infinite  solids. 

Acknowledgment 

The  authors  thank  EPSRC  (UK)  and  the  EU  TMR  Network  Contract  FMRX-CT98-0192 
‘BIOFULLERENES’  for  financial  support  of  this  research. 

References 

1.  T.  Braun,  A.  Schubert,  H.  Maczelka  and  L.  Vasvm,  Fullerene  research  1985-1993, 
World  Scientific,  Singapore,  1995. 

2.  P.W.  Fowler  and  D.E.  Manolopoulos,  An  atlas  of  fullerenes,  Oxford  University  Press, 
Oxford,  1995. 

3.  R.  Taylor,  ed..  The  chemistry  of  fullerenes.  World  Scientific,  Singapore,  1995. 

4.  M.S.  Dresselhaus,  G.  Dresselhaus  and  PC.  Eklund,  Science  of  fullerenes  and  carbon 
nanotubes.  Academic  Press,  San  Diego,  1996. 

5.  P.W.  Fowler,  D.E.  Manolopoulos,  G.  Orlandi  and  F.  Zerbetto,  Energetics  and  isomeri¬ 
sation  pathways  of  a  lower  fullerene:  the  Stone- Wales  map  for  C40,  J.  Chem.  Soc. 
Faraday  Trans.  91  (1995)  1421-1423. 

6.  A.  Aynela,  P.W.  Fowler,  D.  Mitchell,  R.  Schmidt,  G.  Seifert  and  F.  Zerbetto,  C62: 
theoretical  evidence  for  a  non-classical  fullerene  with  a  heptagonal  ring,  J.  Phys.  Chem. 
100  (1996)  15634-15636. 


13 


7.  J.  Liu,  H.  Dai,  J.H.  Hafner,  D.T.  Colbert,  R.E.  Smalley,  S.J.  Tans  and  C.  Dekker, 
Fullerene  crop  circles.  Nature  385  (1997)  780-781. 

8.  M.  Deza  and  M.I.  Shtogrin,  Three-,  four-  and  five-dimensional  fullerenes,  SE  Asian 
Bull.  Math.  23  (1999)  1-10. 

9.  J.L.  Gross  and  T.N.  Tucker,  Topological  graph  theory,  Wiley,  New  York,  1987. 

10.  E.C.  Kirby,  R.B.  Mallion  and  P.  Poliak,  Toroidal  polyhexes,  J.  Chem.  Soc.  Faraday 
Trans.  89  (1993)  1945-1953. 

11.  D.J.  Klein,  Elemental  benzenoids,  J.  Chem.  Inf.  Comp.  Sci.  34  (1994)  453-459. 

12.  E.C.  Kirby,  Recent  works  on  toroidal  and  other  exotic  fullerene  structures,  in:  From 
chemical  topology  to  3-dimensional  geometry,  A.T.  Balaban,  ed..  Plenum  Press,  New 
York,  1997,  Chapter  8,  263-296. 

13.  D.J.  Klein  and  H.  Zhu,  All-conjugated  carbon  species,  in:  Prom  chemical  topology  to 
3-dimensional  geometry,  A.T.  Balaban,  ed..  Plenum  Press,  New  York,  1997,  Chapter 
9,  297-341. 

14.  B.  Griinbaum  and  T.G.  Motzkin,  The  number  of  hexagons  in  and  the  simplicity  of 
geodesics  on  certain  polyhedra.  Can.  J.  Math.  15  (1963)  744-751. 

15.  S.  Negami,  Uniqueness  and  faithfulness  of  embedding  of  toroidal  graphs,  Discrete 
Math.  44  (1983)  161-180. 

16.  A.  Altshuler,  Construction  and  enumeration  of  regular  maps  on  the  torus.  Discrete 
Math.  4  (1973)  201-217. 

17.  A.  Errera,  Sur  les  polyhedres  reguliers  de  I’analysis  situs,  Acad.  Roy.  Belg.  Cl.  Sci. 
Mem.  Coll.  8(2)  7  (1922)  1-17. 

18.  H.R.  Brahana,  Regular  maps  on  the  anchor  ring,  Amer.  J.  Math.  48  (1926)  225-240. 

19.  H.S.M.  Coxeter  and  W.O.J.  Moser,  Generators  and  relations  for  discrete  groups,  2nd 
edition.  Springer,  Berlin,  1965. 

20.  A.  Nakamoto,  Triangulations  and  quadrangulations  of  surfaces,  D.Sc.  thesis.  Dept,  of 
Mathematics,  Keio  University,  Japan,  1996. 

21.  M.  Goldberg,  A  class  of  multi-symmetric  polyhedra,  Tohoku  Math.  J.  43  (1937)  104- 
108. 

22.  H.S.M.  Coxeter,  Virus  macromolecules  and  geodesic  domes,  in:  A  spectrum  of  math¬ 
ematics,  J.C.  Butcher,  ed.,  Oxford  University  Press/ Auckland  University  Press,  Ox- 


14 


ford/ Auckland,  1971,  98-107. 

23.  F.  Plastria,  On  the  number  of  hexagons  in  cubic  maps.  Report  BEIF/98,  Centrum 
voor  Bedrijfsinformatie,  Vrije  Universiteit  Brussel,  1998. 

24.  S.  Negami,  Classification  of  6-regular  Klein  bottle  graphs.  Res.  Rep.  Inf.  Sci.  Tokyo 
Institute  of  Technology  A96  (1984). 

25.  J.  Stillwell,  Classical  topology  and  combinatorial  group  theory,  2nd  edition,  Springer, 
Berlin,  1993  (see  pp.  65-67). 

26.  RW.  Fowler,  J.E.  Cremona  and  J.I.  Steer,  Systematics  of  bonding  in  non-icosahedral 
carbon  clusters,  Theo.  Chim.  Acta  73  (1988)  1-26. 

27.  See,  for  example:  A.D.  Alexandrov,  Convex  Polyheder,  Akademie-Verlag,  Berlin,  1958, 
p.  92. 

28.  A.F.  Wells,  Structural  Inorganic  Chemistry,  4th  edition,  Oxford  University  Press,  Ox¬ 
ford,  1975. 

29.  D.P.  Shoemaker  and  C.B.  Shoemaker,  Concerning  the  relative  numbers  of  atomic 
coordination  types  in  tetrahedrally  close-packed  metal  structures,  Acta  Cryst.  B42 
(1986)  3-11. 

30.  N.  Rivier  and  T.  Aste,  Organised  packing,  in:  The  Kelvin  problem,  D.  Weaire,  ed., 
Taylor  &:  Francis,  London,  1996,  61-68  (see  especially  Table  1). 

31.  J.F.  Sadoc  and  R.  Mosseri,  Frustation  geometrique,  Eyrolles,  Paris,  1997  (see  Chapter 
7,  especially  Table  1). 

32.  R.  Kusner  and  J.M.  Sullivan,  Comparing  the  Weaire-Phelan  equal- volume  foam  to 
Kelvin’s  foam,  in:  The  Kelvin  problem,  D.  Weaire,  ed.,  Taylor  &  Francis,  London, 
1996,  71-80. 

33.  P.W.  Fowler,  Fullerene  graphs  with  more  negative  than  positive  eigenvalues:  the  ex¬ 
ceptions  that  prove  the  rule  of  electron  deficiency?,  J.  Chem.  Soc.  Faraday  Trans.  93 
(1997)  1-3. 

34.  P.W.  Fowler  and  J.I.  Steer,  The  leapfrog  principle:  a  rule  for  electron  counts  of  carbon 
clusters,  J.  Chem.  Soc.  Chem.  Comm.  (1987)  1403-1405. 

35.  D.E.  Manolopoulos,  D.R.  Woodall  and  P.W.  Fowler,  Electronic  stability  of  fullerenes: 
eigenvalue  theorems  for  leapfrog  carbon  clusters,  J.  Chem.  Soc.  Faraday  Trans.  88 
(1992)  2427-2435. 


15 


36.  P.W.  Fowler  and  A.  Ceulemans,  Electron  deficiency  of  the  fullerenes,  J.  Phys.  Chem. 
99  (1995)  508-510. 

37.  M.  Yoshida,  M.  Fujita,  P.W.  Fowler  and  E.C.  Kirby,  Non-bonding  orbitals  in  graphite, 
carbon  tubules,  toroids  and  fullerenes,  J.  Chem.  Soc.  Faraday  Trans.  93  (1997)  1037- 
1043. 

38.  P.W.  Fowler,  RE.  John  and  H.  Sachs,  (3,6)  cages,  hexagonal  toroidal  cages  and  their 
spectra,  in:  Discrete  Mathematical  Chemistry,  P.  Hansen,  P.W.  Fowler  and  M.  Zheng, 
eds.,  DIMACS  Series  on  Discrete  Mathematics  and  Theoretical  Computer  Science, 
AMS,  1999. 

39.  For  a  simple  geometric  construction  of  the  spectrum,  see  for  example:  G.B.  Gill  and 
M.R.  Willis,  Pericyclic  reactions.  Chapman  and  Hall,  London,  1974. 

40.  E.C.  Kirby,  Remarks  upon  recognising  genus  and  possible  shapes  of  chemical  cages  in 
the  form  of  polyhedra,  tori  and  Klein  bottles,  Croat.  Chem.  Acta  68  (1995)  269-282. 

41.  D.  Hilbert,  S.  Cohn-Vossen,  Geometry  and  the  imagination,  2nd  edition,  Chelsea  Pub¬ 
lishing  Co.,  New  York,  1990  (see  Fig.  300,  p.  312). 

42.  D.M.  Cvetkovic,  M.  Doob  and  H.  Sachs,  Spectra  of  graphs:  theory  and  application, 
Academic  Press,  New  York,  1979. 

43.  P.W.  Fowler  and  K.M.  Rogers,  Eigenvalue  spectra  of  leapfrog  polyhedra,  J.  Chem. 
Soc.  Faraday  Trans.  94  (1998)  2509-2514. 


16 


Table  1;  Enumeration  of  toroidal  poly  hexes,  fe  is  the  number  of  hexagonal  faces.  The 
count  of  regular  polyhexes  is  for  2-cell  embeddings  of  trivalent  maps  with  all  faces  hexag¬ 
onal.  Polyhedral  polyhexes  are  those  with  dual  triangulations  in  which  the  intersection  of 
any  two  faces  is  either  a  vertex  or  is  empty.  The  final  rows  give  the  counts  and  canonical 
lattice- vector  parameters  for  the  restricted  class  of  regular  polyhexes  of  maximal  automor¬ 
phism  group. 

/e  1  2  3  4  5  6  7  8  9  10  11  12  13  14 

Regular  1  1  232  3  354  438  4  5 

Polyhedral  _____  _  11211422 

Fully  symmetric  1-11-  -  l-l--  il- 

(1,0)  -  (1,1)  (2,0)  -  -  (2.1)  -  (3,0)  -  -  (2,2)(3,1)  - 
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Table  2;  Centrosymmetric  fullerenes  Cn  (20  <  n  <  100)  with  and  without  pentagon 
adjacencies.  Each  is  the  parent  of  an  elliptic  P^-fullerene. 


n 

C2h 

D2h 

Dqh 

Dzd 

D^d 

Th 

Ih 

Total 

20 

0 

0 

0 

0 

0 

0 

0 

1 

1 

24 

0 

0 

0 

0 

0 

0 

0 

0 

0 

28 

0 

0 

0 

0 

0 

0 

0 

0 

0 

32 

0 

0 

0 

0 

1 

0 

0 

0 

1 

36 

0 

0 

0 

1 

0 

0 

0 

0 

1 

40 

0 

0 

1 

0 

0 

2 

0 

0 

3 

44 

0 

0 

0 

0 

3 

0 

0 

0 

3 

48 

0 

1 

2 

0 

0 

0 

0 

0 

3 

52 

0 

1 

2 

0 

0 

0 

0 

0 

3 

56 

1 

2 

1 

0 

2 

0 

0 

0 

6 

60 

0 

4 

1 

2 

0 

1 

0 

1 

9 

64 

2 

4 

1 

0 

0 

0 

0 

0 

7 

68 

0 

7 

2 

0 

3 

0 

0 

0 

12 

72 

3 

7 

5 

0 

0 

0 

0 

0 

15 

76 

2 

11 

0 

0 

0 

0 

0 

0 

13 

80 

5 

16 

0 

0 

2 

2 

0 

1 

26 

84 

9 

4 

6 

2 

1 

0 

0 

0 

22 

88 

10 

16 

5 

0 

0 

0 

0 

0 

31 

92 

12 

13 

4 

0 

5 

0 

1 

0 

35 

96 

20 

16 

3 

2 

1 

0 

0 

0 

42 

100 

14 

28 

2 

0 

0 

2 

0 

0 

46 

Total 

78 

130 

35 

7 

18 

7 

1 

3 

279 
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Table  3:  Centrosymmetric  fullerenes  Cn  (60  <  n  <  140)  with  isolated  pentagons.  Each 
is  the  parent  of  an  elliptic  F^-fullerene. 


n 

C2h 

D2h 

Dqh 

Dzd 

Dsd 

Th 

h 

Total 

60 

0 

0 

0 

0 

0 

0 

0 

1 

1 

72 

0 

0 
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Captions  to  Figures 


Figure  1:  Smallest  spherical,  toroidal,  Klein-bottle  and  elliptic  fullerenes.  The  first  col¬ 
umn  lists  the  graphs  drawn  in  the  plane,  the  second  the  map  in  the  appropriate  surface 
and  the  third  the  dual  in  the  same  surface.  The  examples  are  (a)  the  dodecahedron  (dual 
=  icosahedron),  (b)  the  Heawood  graph  (dual  =  iTr),  (c)  a  small  Klein-bottle  polyhex 
(dual  =  K3,3_3)  and  (d)  the  Petersen  graph  (dual  =  Kq). 

Figure  2:  Examples  of  small  polyhedral  toroidal  polyhexes  presented  as  benzenoids 
with  glued  vertices  (indicated  on  the  periphery).  These  are  the  polyhexes  with  Cox- 
eter/Goldberg  codes  (2,1),  (3,0),  (2,2),  (3,1).  The  map  (2,1)  is  the  Heawood  graph, 
which  is  drawn  in  two  different  presentations  in  Fig.  1. 

Figure  3:  A  tiling  of  the  plane  with  combinatorial  pentagons  alone  in  which  all  vertices 
are  of  degree  3  or  4. 

Figure  4.  The  smallest  space  fullerenei  an  assembly  of  20-  and  24-vertex  spherical 
fullerenes  in  ratio  2  :  6  (see  Wells  [28]).  Different  metric  variations  of  this  structure  appear 
as  a  clathrate,  and  as  the  best  Kelvin  foam.  Its  dual  is  A15,  the  structure  of  ;(3-tungsten 
and  Cr3Si. 

Figure  5:  Leapfrogs  of  the  the  smallest  polyhedral  fullerenes  on  the  surfaces  5^,  T^, 
and 

Figure  6:  Construction  of  a  non-orientable  surface  such  as  a  Mobius  strip  by  twisting  and 
gluing  a  planar  system  brings  together  p  orbitals  of  opposite  phase  across  the  seam  of  the 
twist. 

Figure  7:  Construction  of  toroidal  double  covers  of  Klein-bottle  polyhexes:  (a)  shows  a 
bipartite  -poly hex  on  18  vertices  which  is  covered  by  a  centrosymmetric  D^d  torus  on 
36  vertices;  (b)  shows  a  non-bipartite  A^-polyhex  on  24  vertices,  covered  by  a  48-vertex, 
centrosymmetric  De/i  torus.  The  steps  in  the  construction  are:  (i)  copy  the  Klein-bottle 
polyhex  by  a  simple  translation;  (ii)  flip  the  second  copy  by  180°  about  the  translation 
vector;  (iii)  fuse  the  two  copies.  The  arrows  indicate  edge  identifications,  and  members  of 
a  pair  of  covering  vertices  are  marked  with  the  same  symbol  (filled  circle  or  square).  Note 
that  in  case  (a),  the  two  covering  vertices  are  to  be  found  in  the  same  partite  set  on  the 
torus,  whereas  in  case  (b)  they  are  not. 

Figure  8:  Origin  of  zero  eigenvalues  in  toroidal  and  Klein-bottle  leapfrog  polyhexes.  An 
18- vertex  polyhex  is  shown  in  (a),  numbered  as  for  connection  either  as  a  T^-fullerene  or 
on  with  the  twist  occurring  on  gluing  left  and  right  edges  of  the  parallelogram.  This 
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polyhex  is  a  leapfrog,  as  shown  in  (b)  where  shading  identifies  Clar  faces  and  a  double 
bond  a  Fries  edge.  Edges  crossing  the  seam  of  the  twist  and  therefore  to  be  weighted  by 
-1  in  the  Hiickel  problem  on  are  in  bold.  When  the  polyhex  is  connected  on  all  four 
vectors  (c)  to  (f)  represent  eigenvectors  of  A  with  zero  eigenvalue.  When  it  is  connected  on 
(c)  and  (e)  are  zero-eigenvalue  eigenvectors  for  A  and  (d)  and  (f)  are  zero-eigenvalue 
eigenvectors  for  H,  the  weighted  adjacency  matrix.  Details  of  the  construction  of  (c)  to 
(f)  from  local  bonds  and  anti-bonds  on  Fries  edges  are  given  in  ref.  [43]. 

Figure  9:  Multiple  leapfrogs  of  a  -polyhex.  The  parent  18-vertex  graph  (thick  lines) 
is  leapfrogged  to  54  and  then  to  162  vertices  (thin  lines).  Only  the  first  and  third  graphs 
are  bipartite:  when  the  patch  is  glued  as  indicated  by  the  arrows  black  vertices  give  a 
consistent  partite  set  (filled  circles)  in  the  parent  and  double  leapfrog  but  not  in  the  single 
leapfrog 
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1.  INTRODUCTION 


Molecular  Dynamics  (MD)  has  been  applied  to  biomolecular  systems  and  has 
shown  to  be  able  of  elucidating  on  their  energetic,  dynamical  and  statistical  mechanical 
properties.  MD  is  a  deterministic  procedure  in  which  the  atoms  in  a  molecule  move 
according  to  classical  mechanics.  Thus,  in  a  MD  calculation  we  have  to  integrate 
Newton’s  equations  of  motion  over  time  for  the  N  atoms  of  the  molecular  system  which 
is  being  studied 


92r(i,0/9t^=  ni(i)-iF(i,t)  i  =  l,...,N  (1) 

with  F(i,t)  =  -9V(r(l,t),...,r(N,t))/ar(i,t)  as  the  force  on  atom  i  at  time  t, 
V(r(l,t),...,r(N,t))  as  the  potential  energy  function,  r(i,t)  as  the  position  of  atom  i  at  time 
t  and  m(i)  as  the  mass  of  atom  i.  In  the  present  work,  the  leap-frog  algorithm  (van 
Gunsteren  and  Berendsen,  1990)  has  been  used  to  compute  the  position  vectors  r(i,t) 
using  the  forces  and  previous  positions  of  the  atoms  at  a  series  of  time  intervals  which 
differ  by  At. 

The  set  of  atomic  positions  occupied  in  a  given  time  tj  is  called  conformation 
(vector  r(i,tj),  i=l,...,N)  and  a  succession  of  conformations,  in  n  time  intervals,  is  named 
a  trajectory  (matrix  [r(i,tj)],  i=l,...,N  j=l,...,n).  For  simplicity,  we  shall  use  r(i,j), 
i=l,...,N  to  represent  a  conformation  and  [r(i,j)],  i=l,...,N  j=l,...,n  to  represent  a 
trajectory.  A  trajectory  is  thus  given  by  the  following  matrix 
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r(l,l) 


r(2,l) 


r(N,l)  => 


1st  conformation 


r(l,2)  r(2,2)  .  r(N,2)  =>  2nd  conformation  (2) 


r(l,n)  r(2,n)  .  r(N,n)  =>  nth  conformation 

The  trajectories  generated  by  a  MD  calculation  are  the  basis  for  the  calculation 
of  the  system  properties.  However,  the  correspondent  binary  trajectory  files  are 
extensive  and  create  problems  of  storage  space.  The  classical  lossless  compression 
algorithms,  such  as  the  Huffman  coding  (Huffman,  1952)  used  in  the  compression  pack 
utility,  adaptive  Huffman  (Gallager,  1978)  used  in  compact,  LZW  (Welch,  1984)  used 
in  compress  and  LZ77  (Ziv  and  Lempel,  1977)  used  in  gzip,  give  poor  efficiencies  in  the 
compression  of  this  type  of  files.  Therefore,  specific  lossy  algorithms,  which  increase 
significantly  the  compression  efficiency  preserving  a  high  degree  of  precision,  are  of 
great  importance  to  attain  a  better  approach  to  this  problem. 

This  work  introduces  the  reader  to  a  new  specific  algorithm,  named  Byte 
Structure  Variable  Length  Coding  (BS-VLC),  which  increases  significantly  the 
compression  efficiencies  of  the  best  classical  lossless  algorithms  preserving  a  high 
degree  of  precision.  This  algorithm  was  used  in  the  compression  of  trajectory  files 
generated  by  MD  applied  to  the  biological  systems  trypsin  and  trypsin:PTI  complex. 
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2.  METHODS 


2.1.  COMPRESSION  WITH  THE  BS-VLC  ALGORITHM 

The  initial  data  in  this  process  is  the  trajectory  matrix  [r(i,j)],  i=l,...,N  j=l,...,n 
which  in  the  cartesian  space  can  be  substituted  by  three  matrices  [ru(i,j)]  with  u=x,  y  or 
2  respectively  and  i=l,...,N  j=l,...,n.  The  trajectories  are  usually  stored  in  binary  files 
with  every  coordinate  figuring  as  a  4  bytes  real  number.  All  of  this  data  is  submitted  to 
three  steps  in  a  BS-VLC  compression:  pre-processing,  quantization  and  variable  length 
coding.  These  will  be  analyzed  separately. 

Pre-processing  The  proposed  methodology  here  is  based  on  the  conversion  of  the 
initial  data  into  differential  trajectory  matrices  [Aru(i,j)],  i=l,...,N  j=l,...,n  in  which  the 
different  components  are  given  by  coordinate  differences,  i.e. 


Aru(i,j)  =  ru(i,j)  -  r™^  (i,j)  (3) 

where  (i,j)  is  the  reference  coordinate  associated  with  ru(i,j).  It  is  necessary  to  store 
no  conformations  (standard  integral  conformations  (  [r^(i,k)],  i=l,...,N  j=l,...,no  )  in  its 
original  form  to  allow  the  rebuilding  of  trajectory  matrices  [ru(i,j)]  from  the 
correspondent  differential  matrices  [Aru(i,j)],  in  the  decompression.  The  number  of 
standard  integral  conformations  depends  on  the  criteria  used  in  the  selection  of  the 
reference  coordinates  (i,j)  and  will  be  discussed  later  in  section  2.4. 
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Quantization  This  information  is  further  transformed  into  integer  differential 

trajectory  matrices  [u(i,j)],  i=l,...,N  j=l,.-,n  with  components  given  by  coordinate 
differences  converted  to  integers  multiplied  by  a  scaling  factor  (Scale)  as  follows 

u(i,j)  =  Aru(i,j)  X  Scale  (4) 

The  higher  the  scaling  factor,  the  higher  the  precision  of  the  process  is  and  the 
lower  the  compression  efficiency  becomes;  the  contrary  also  applies,  i,e.,  to  increase  the 
compression  efficiency  one  can  lower  the  scaling  factor  with  the  added  disadvantage  of 
lowering  the  precision.  It  is  important  to  find  a  middle  point  which  will  provide  a  good 
compression  coupled  with  a  reasonable  error. 

Variable  Length  Coding  (VLC)  Subsequently,  the  elements  of  the  integer 
differential  matrices  [u(i,j)]  are  compressed  using  a  variable  length  code,  where  the 
trajectories  are  subdivided  into  sets  (structures)  which  are  separately  coded. 

2.2.  CODING  WITHIN  THE  BS-VLC  ALGORITHM 

Byte  integer  signal  coding  The  integers,  components  of  the  integer  differential 
trajectory  matrices  [u(i,j)],  can  be  represented  by  implicit  signal  coding  or  explicit 
signal  coding.  The  former  is  the  conventional  form  of  partitioning  integers  into  bytes, 
i.e.,  1  bit  is  always  kept  to  represent  the  signal  and  the  other  N-1  bits  represent  the 
respective  absolute  value;  in  the  latter,  signal  bits  are  explicitly  codified  and  grouped 
into  signal  bytes.  This  representation  can  be  less  expensive  if  all  the  data  is  centered  in 
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certain  regions.  In  a  32  bit  machine  the  minimum  number  of  bytes  necessary  for  the 
representation  of  absolute  values,  according  to  the  interval  where  they  are,  is 

1  byte  <=>  [0  ;  2^  -1] 

2  byte  <=>  [2^  :  2^^  -1]  (5) 

3  byte  <=>[216  ;  224-1] 

4  byte  <=>[224  ;  232-1] 

However,  when  we  are  working  with  signed  integers,  one  half  of  this  space  has  to  be 
used  to  represent  negative  integers.  Therefore,  we  end  up  with  the  following  association 
between  number  of  bytes  and  intervals 

1  byte  <=>[0:  27-1]  u  [-1  : -27] 

2  byte  <=>  [27  :  215  -i]  ^  [-2?  -i .  -215  ] .  (6) 

3  byte  <=>[215  ;  223  -l]  u  [-215  -1: -223  ] 

4  byte  <=>  [223  .  231  -i]  ^  [-223  -i;  -231  ] 

On  the  other  hand,  the  signal  can  be  explicitly  coded  allowing  the  absolute 
values  to  be  represented  according  to  eq.  (5).  One  possible  way  to  perform  this  is  to 
subdivide  the  data  in  groups  of  eight  elements  and  to  associate  the  value  1  to  their 
signals  when  they  are  positive  and  the  value  0  when  they  are  negative. 

For  example:  +  -  +  +  --  +  +  <=>  10  110  0  11 
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Each  group  of  eight  signal  bits  is  then  stored  in  a  signal  byte  and  the  entire 
sequence  of  signal  bytes  is  stored  in  a  signal  vector.  The  dimension  (nsbyte)  of  this 
vector  is  equal  to  the  number  of  coordinates  over  8.  Associating  eqs.  (5)  and  (6),  the  32 
bit  integer  space  can  be  subdivided  into  seven  intervals  which  need  different  minimum 
numbers  of  bytes  for  coding  according  to  whether  implicit  or  explicit  signal  coding  is 
used.  All  this  information  has  been  collected  in  Table  I.  It  can  be  observed  from  the 
table  that  explicit  signal  coding  is  favored  when  data  is  concentrated  in  intervals  2,  4  or 
6. 


Concept  of  structure  Structures  are  sets  within  the  integer  differential  trajectory 

matrices,  where  elements  exhibit  some  sort  of  correlation.  BS-VLC  is  based  on  the  fact 
that  a  structure  could  be  represented  by  fewer  bytes  than  entire  trajectory  matrices.  All 
the  elements  within  a  stmcture  (Si)  are  represented  by  the  number  of  bytes  (nbmax(Si)) 
necessary  to  codify  the  element  with  largest  absolute  value.  However,  it  is  necessary  to 
keep  one  byte  extra  for  each  structure  to  indicate  the  number  of  bytes  used  in  its 
representation.  If  a  trajectory  matrix  is  subdivided  into  no  standard  integral 
conformations,  ni  structures  that  use  implicit  signal  coding  and  na  structures  that  use 
explicit  signal  coding,  the  byte  length  of  a  compressed  file  (CFBL)  is  given  by 


nl  n2 

CFBL=ovh+(ni+n2)+nsbyte+  ^  n(Si)xnbmaxl(Si)+  ^  n(Sj)xnbmax2(Sj)+nso  (7) 

1=1  j=\ 


In  eq.  (7),  ovh  is  the  structure  organization  overhead  which  indicates  how  the 
trajectory  file  was  compressed.  This  information  is  necessary  to  the  decompression 
process,  but  represents  a  very  insignificant  part  (less  than  0.01%)  of  CFBL.  In  the  same 
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equation,  n(Si)  and  n(Sj)  are  the  number  of  elements  of  structures  Si  and  Sj,  respectively; 
nbmaxl(Si)  is  the  number  of  bytes  necessary  to  represent  the  element  of  structure  Si 
with  largest  absolute  value  determined  using  the  third  column  of  Table  I;  nbmax2(Sj)  is 
the  number  of  bytes  necessary  to  represent  the  element  of  structure  Sj  with  largest 
absolute  value  determined  using  the  fourth  column  of  Table  I,  nso  is  the  number  of  bytes 
necessary  to  store  the  standard  integral  conformation  and  nsbyte  is  the  total  number  of 
elements  represented  with  explicit  signal  coding  over  8: 

nso  =  4  X  3N  X  no  (8) 

n2 

nsbyte=2^  n(Sj)/8  (9) 

7=1 

Types  of  structures  The  BS-VLC  algorithm  considers  two  types  of  structures: 
structures  associated  with  temporal  correlations  (blocks  and  conformations  within  a 
block)  and  structures  associated  with  spatial  correlations  (atoms  within  a  block  and 
atomic  cartesian  coordinates  within  a  block). 

Block  The  block  structure  is  the  largest  one  considered  in  the  BS-VLC  method.  A 
block  is  a  set  of  sequential  conformations  and  this  is  a  natural  structure  with  the  pre¬ 
processing  and  quantization  methodologies  adopted  in  this  work.  If  implicit  signal 
coding  is  assumed,  the  byte  length  of  block  Bi  (BBL(Bi))  is  given  by 

BBL(Bi)=  3N  X  nconf(Bi)  x  nbmaxl(Bi)  (10) 


9 


where  nconf(Bi)  is  the  number  of  conformations  of  block  Bi.  If  explicit  signal  coding  is 
assumed,  BBL(Bi)  is  given  by 

BBL(Bi)=  nsbyte(Bi)  +  3N  x  nconf(Bi)  x  nbmax2(Bi)  (11) 

where  nsbyte  (Bi)  is  calculated  as 

nsbyte(Bi)=nconf(Bi)  X  3N  /  8  (12) 

Conformation  within  a  block  The  conformation  within  a  block  is  the  smallest  temporal 
structure  considered  in  the  BS-VLC  method.  In  this  case,  BBL(Bi)  with  implicit  signal 
coding  is  given  by 

nconf(Bi) 

BBL(Bi)=  nconf(Bi)  +  3N  ^  nbmaxl(conf(k))  (13) 

*=1 

If  explicit  signal  coding  is  assumed,  BBL(Bi)  is  calculated  as 


ncnnf  ( Bi) 

BBL(Bi)=  nconf(Bi)  +  nsbyte(Bi)  +  3N  ^  nbmax2(conf(k))  (14) 

k=\ 


Atom  within  a  block  In  an  MD  trajectory  it  is  possible  to  establish  spatial  correlation. 
An  atom  within  a  block  is  a  structure  that  reflects  this  type  of  correlation.  For  example, 
in  a  protein  the  differential  coordinates  of  the  side  chain  atoms  are  usually  larger  than 
the  differential  coordinates  of  the  main  chain  atoms.  In  this  case,  BBL(Bi)  with  implicit 
signal  coding  is  given  by 
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(15) 


N 

BBL(Bi)=  N  +  3  nconf(Bi)  ^  nbmaxl(at(k)) 

jt=i 


If  explicit  signal  coding  is  assumed,  BBL(Bi)  is  calculated  as 


N 

BBL(Bi)=  N  +  nsbyte(Bi)  +  3nconf(Bi)  ^  nbmax2(at(k))  (16) 

*=1 


Atomic  cartesian  coordinate  within  a  block  The  atomic  cartesian  coordinate 

within  a  block  is  a  subdivision  of  the  atom  structure  and  it  is  appropriate  when  the 
potential  energy  function  exhibits  some  type  of  anisotropy.  Here,  BBL(Bi)  with  implicit 
signal  coding,  is  given  by 


N 

BBL(Bi)=3N+nconf(Bi){  ^  nbmaxl(x(k))+ 

*=i 

N  N 

+  2^  nbmaxl(y(k))+ nbmaxl(z(k))} 

t=l  <:=1 


(17) 


If  explicit  signal  coding  is  assumed,  BBL(Bi)  is  calculated  as 


BBL(Bi)=  3N  +nsbyte(Bi)  + 

N  N  N 

+  nconf(Bi){^  nbmax2(x(k))+ ^  nbmax2(y(k))+ ^  nbmax2(z(k))}  (18) 

*=1  k=\  k=\ 
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In  eqs.  (17)  and  (18)  x(k),  y(k)  and  z(k)  are  the  cartesian  coordinates  x,y  and  z  of 
atom  k  in  block  Bj,  respectively. 

2.3.  DECOMPRESSING  THE  FILES  OBTAINED  WITH  THE  BS-VLC 

ALGORITHM 

A  BS-VLC  decompression  of  files  follows  three  steps  which  will  be  analyzed 
separately,  namely  variable  length  decoding,  inverse  quantization  and  post-processing. 

Variable  Length  Decoding  (VLD)  In  this  step,  the  compressed  file  is  decoded 
restoring  the  original  integer  differential  matrices  [u(i,j)]. 

Inverse  quantization  The  differential  matrices  [u(i,j)]  are  used  to  produce  the 

decompressed  differential  matrices  [A  r„  (i,j)]  with 

A  r„  (i,j)  =  u(i,j)/Scale  (19) 

The  elements  of  the  decompressed  differential  matrices  differ  from  the  original 
differential  matrices  from  a  given  quantity  e(Aru(i,j): 

A  r„  (i,j)  =  Aru(i,j)  +  e(Aru(i,j))  (20) 

The  magnitude  of  the  error  e(Aru(i,j)  depends  on  the  scaling  factor  selected. 
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Post-processing  The  decompressed  trajectory  matrices  [  r„  (i j)]  are  then 
calculated  from  the  decompressed  differential  trajectory  matrices  [A  r„  (i,j)]  and  from 
the  decompressed  reference  coordinates  (i,j). 


ru(iJ)=  r"^(ij)+Ar„(ij) 


(21) 


The  elements  of  the  decompressed  trajectory  matrices  differ  from  the  elements 
of  the  original  trajectory  matrices  by  a  quantity  e(ru(i,j)). 


fu  (i.j)  =  ru(ij)  +  e(ru(i,j)) 


(22) 


The  characteristics  of  the  errors  e(ru(i,j))  depend  on  the  type  of  the  coordinates 
(i,j)  selected.  The  nature  of  these  dependencies  will  be  discussed  later  in  section  2.4. 
A  flow  chart  of  the  BS-VLC  algorithm  is  given  in  Figure  1 . 


2.4.  SELECTION  OF  THE  REFERENCE  COORDINATES 

Several  alternative  selection  criteria  are  possible  to  choose  the  reference 
coordinates  r”*^(i,j)  used  in  eq.  (3).  The  characteristics  of  the  errors  e(ru(i,j),  in  eq.  (22), 
are  conditioned  by  the  reference  coordinates  selected.  We  have  considered  the  following 
types  of  reference  coordinates: 
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Atomic  coordinates  The  reference  coordinates  are  defined  as  the  atomic  coordinates 


immediately  before  the  present  ru(i,j)  coordinates 

C^(id)  =  ru(iJ-l)  (23) 

The  standard  integral  conformation  is  the  first  conformation  of  the  trajectories 
ru(i,l),  with  the  necessary  number  of  bytes  to  be  stored,  given  as 

nso  =  4  X  3  na  (24) 

Equation  (21)  becomes 

(ij)  =  (ij-l)  +  A  r„  (i,j)  (25) 

Substituting  eq.  (20)  into  eq.  (25),  one  obtains 

(iJ)  =  Tu  (iJ-l)  +  Aru(i,j)  +  e(Ar„(i,j))  (26) 

Repeating  this  substitution  for  all  the  r„  (i,j-l)  we  obtain 

ru(i,j)  =  ru(i,l)+ Aru(i,k)  +  e(Aru(i,k))  (27), 

A-2  k=2 


or 
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(28) 


r„  (i,j)  =  ru(i,j)  + 


j 

1 

lt=7 


e(Aru(i,k)) 


Comparing  eqs.  (28)  and  (22),  the  errors  e(ru(i,j)  can  be  calculated  as 


e(ru(i,j))  =  X  e(Aru(i,k)) 

k=2 


(29) 


If  the  successive  values  e(Aru(i,j))  are  not  well  compensated  this  will  originate 
cumulative  errors  e(ru(i,j)). 

Atomic  coordinates  within  a  block  One  possible  methodology  which  partially 
prevents  the  cumulative  nature  of  the  error  inherent  to  the  previous  selection,  consists  in 
truncating  its  accumulation  at  the  end  of  each  block.  In  this  situation,  it  is  necessary  to 
store,  not  only  the  first  conformation  of  the  trajectory,  but  also  the  first  conformation  of 
all  the  blocks.  Consequently,  the  error,  eqs.  (26)-(28),  is  still  cumulative  within  a  block 
but  becomes  null  for  the  transition  between  the  last  conformation  of  a  block  and  the  first 
conformation  of  the  next  block.  The  compression  efficiencies  are  also  lightly  reduced 
by  this  procedure: 

nso  =  4  X  3N  X  nblo  (30) 

where  nblo  is  the  total  number  of  blocks  used. 

Decompressed  atomic  coordinates  Here,  the  reference  coordinates  are  defined  as 
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C(id)=  ru(ij-l)  (31) 

Equation  (3)  becomes, 

Aru(i,j)  =  ru(i,j)  -  r„  (32) 

and  eq.  (20)  can  be  rewritten  as 

A  r„  (i,j)  =  ru(i,j)  -  r„  (i,j-l)  +  e(Aru(i,j))  (33) 

Equation  (33)  can  be  rearranged  as 

r„  (i,j- 1)  +  A  r„  (i,j)  =  ru(i,j)  +  e(Ar„(i,j))  (34) 

or 

K  (ij)  =  ru(i  j)  +  e(Aru(i,j))  (35) 

Comparing  eqs.  (35)  and  (22),  we  conclude  that  the  errors  e(ru(i,j))  can  be 
calculated  as 

e(ru(i,j))  =  e(Aru(i,j))  (36) 

Here,  the  error  becomes  non  cumulative  preserving  the  compression  efficiency 
obtained  with  reference  coordinates  a).  This  formulation  corresponds  to  the  classical 
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scheme  DPCM  (Differential  Pulse  Code  Modulation)  (Jayan  and  Noll,  1984)  which  is 
the  basis  of  several  audio,  image  and  video  compression  algorithms  such  as  JPEG 
(Wallace,  1991)  and  MPEG  (ISO/IEC,  1994). 

The  best  way  to  evaluate  the  cumulative  characteristics  of  the  error  associated 
with  the  compression/decompression  process  is  to  represent  the  conformational  root 
mean  square  deviation  (rms)  between  the  decompressed  r„(i,j)  and  the  original 
coordinates  ru(i,j), 

N 

m's=([E  +  +  1))‘“  (37), 

1=1 

as  a  function  of  simulation  time. 

The  total  mean  square  deviation  ( rms )  is  given  by; 

rms  ={[ XX  ( (i.j)-rx(iJ))V  (i,j)-ry(i,j))V  f,  (id)-rz(iJ))^]/(3Nxn-l)}^^^  (38) 

7=1  /=i 

allowing  an  evaluation  of  the  global  precision  of  the  compression/decompression 
'  process. 
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2.5.  IMPLEMENTATION  OF  THE  BS-VLC  ALGORITHM 


In  the  BS-VLC  compression  scheme  the  following  procedure  has  to  be 
considered 

Pre-processing 

1)  Read  initial  data  [ru(i,j)]. 

2)  Select  the  number  of  blocks  (nblo). 

3)  Select  the  reference  coordinates  (atomic  coordinates,  atomic  coordinate  within  a 
block  or  decompressed  atomic  coordinates). 

4)  Store  the  standard  integral  conformations  [r®  (i,k)]. 

5)  Calculate  the  differential  trajectory  matrices  [Aru(i,j)]. 

Quantization 

1)  Select  the  scaling  factor  (Scale). 

2)  Calculate  the  integer  differential  matrices  [u(i,j)]. 

VLC 

For  each  block: 

1)  Select  the  structure  (entire  block,  conformation  within  the  block,  atom  within  the 
block  or  atomic  cartesian  coordinate  within  the  block)  and  the  signal  coding  (explicit 
or  implicit)  which  will  allow  a  more  efficient  compression  of  the  block  (eqs.  (10)-(18)). 

2)  Compress  the  block  using  the  structure  and  the  signal  coding  selected  in  1). 

The  byte  length  of  the  compressed  file  (CFBL)  is  given  by 
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(39) 


nhlo 

CFBL  =  ovh  +  ^  BBL(Bi)  +  nso 
y=i 

In  the  BS-VLC  decompression  scheme  the  following  procedure  must  to  be 
considered; 

VLD 

For  each  block: 

1)  Read  the  stmcture  (entire  block,  conformation  within  the  block,  atom  within  the 
block  or  atomic  cartesian  coordinate  within  the  block)  and  the  signal  coding  (explicit  or 
implicit)  used  in  the  compression. 

2)  Decompress  the  block  using  the  stmcture  and  signal  coding  read  in  1)  (the  integer 
differential  matrices  [u(i,j)]  are  rebuilt). 

Inverse  quantization 

1)  Read  the  scaling  factor  (Scale). 

2)  Calculate  the  decompressed  differential  matrices  [Ar„  (i,j)]. 

Post-processing 

1)  Read  the  standard  integral  conformations  [r^(i,k)]. 

2)  Calculate  the  decompressed  trajectory  matrices  [  r„  (i,j)]. 
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3.  RESULTS  AND  CONCLUSIONS 


In  this  work  two  solvated  proteic  systems,  trypsin  and  trypsin:PTI  complex, 
have  been  studied.  Trypsin  is  a  digestive  enzyme  and  PTI  (Pancreatic  Trypsin  Inhibitor) 
is  a  natural  inhibitor  of  trypsin. 

In  the  MD  simulations  (Melo  and  Ramos,  1997),  all  the  water  molecules  and 
amino  acid  residues  within  a  15A  sphere,  centered  in  the  active  center  of  trypsin,  were 
allowed  to  move.  Harmonic  forces  were  used  to  restrain  any  water  molecule  from 
leaving  the  15-18  A  boundary;  this  was  achieved  by  constraining  the  oxygen  atoms  of 
the  water  molecules  to  their  initial  positions  using  a  force  constant  of  0.6  kcalmoT^A'^ 
The  other  residues  were  included  in  the  determination  of  the  energy  and  forces,  but 
were  kept  fixed  in  their  starting  positions.  The  choice  for  performing  the  MD 
calculation  in  this  way  had  its  reasons  in  the  size  of  the  simulation  which  would  have 
been  prohibitive  otherwise  (Melo  and  Ramos,  1997).  A  total  number  of  1815  atoms  for 
solvated  trypsin  and  1666  for  solvated  trypsin:PTI  were  allowed  to  move  during  the  MD 
simulations.  Newton’s  equations  of  motion  were  integrated  every  0.00 Ips  using  the 
leap-frog  algorithm  (van  Gunsteren  and  Berendsen,  1990)  and  120  ps  trajectories  were 
generated  for  both  systems  studied.  Total  simulation  time  was  6  240  ns  for  trypsin,  6 
400  ns  for  trypsin:PTI  and  12  640  ns  for  both  trypsin  and  trypsin:PTI.  The  trajectory 
matrices  have  the  following  dimensions: 

ru(  18 15,6000)  =>  for  solvated  trypsin 
ru(  1666,6000)  =>  for  solvated  trypsinrPTI 
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All  simulations  were  carried  out  with  the  program  CHARMM  (Brooks  et  al, 
1983)  and  the  two  trajectories  were  compressed  using  the  lossless  compressors 
(compress,  gzip,  pack  and  compact)  as  well  as  the  BS-VLC  algorithm. 

In  the  BS-VLC  compressions,  a  total  number  of  60  blocks  was  used.  Preliminary 
compressions,  using  the  three  alternative  reference  coordinates  (atomic  coordinates, 
atomic  coordinates  within  a  block  and  decompressed  atomic  coordinates)  and  a  scaling 
factor  of  10^,  were  performed.  The  conformational  root  mean  square  deviation  (rms) 
was  computed  as  a  function  of  simulation  time.  The  results  obtained  are  presented  in 
Figure  2;  they  confirm  that  atomic  coordinates,  atomic  coordinates  within  a  block  and 
decompressed  atomic  coordinates  lead  to  a  cumulative,  a  truncated  cumulative  and  a 
non  cumulative  error,  respectively.  Consequently,  as  has  been  pointed  in  section  5,  the 
decompressed  atomic  coordinates  are  the  most  appropriate  selection  for  reference 
coordinates.  This  selection  was  used  in  the  remainder  BS-VLC  compression  presented 
here. 

To  evaluate  the  efficiency  of  the  BS-VLC  algorithm,  several  compressions  of 
trypsin  and  trypsin;PTI  trajectory  files  were  performed,  using  28  different  values  as 
scaling  factors.  The  total  root  mean  square  deviations  (rms)  were  computed  for  all  the 
cases.  The  results  obtained  are  presented  in  Table  11  and  Figure  3.  Additionally,  the 
compression  efficiencies,  obtained  with  different  algorithms,  can  be  visualized  in  Figure 
4. 

The  analysis  of  both  Table  II,  Figure  3  and  Figure  4  is  extremely  favorable  to  the 
BS-VLC  algorithm.  In  fact,  when  a  scaling  factor  of  10^  is  used  BS-VLC  algorithm  has 
a  lossless  behavior  and  presents  a  significant  larger  compression  efficiency  than  the 
classical  lossles  algorithms. 
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Green  et  al.  (1995)  have  made  available  a  compression  algorithm  which  reaches 
70%  as  compared  to  75%  achieved  in  this  work.  Additionally,  this  algorithm  leads  to  a 
cumulative  error  in  the  compression/decompression  process  while  BS-VLC  has  non 
cumulative  error  behavior. 

Here,  all  the  obtained  results  enable  us  to  conclude  that  BS-VLC  has  near 

lossless  behavior  (rms=0)  when  a  scaling  factor  close  to  10^  is  used.  In  this  situation, 
BS-VLC  nearly  triplicates  the  compression  efficiency  of  the  best  classical  lossless 
algorithm  (  LZ77  used  in  gzip).  In  addition,  larger  compression  efficiencies  (=50%)  can 

be  managed  with  BS-VLC  preserving  a  high  degree  of  precision  ( rms  between  10'^  and 
10'^).  For  compression  efficiencies  larger  than  50%,  the  precision  decreases 
significantly.  However,  a  compression  with  the  maximum  efficiency  possible  (75%) 
within  this  algorithm  can  be  performed  with  good  precision  ( rms  =10'^). 
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FIGURE  CAPTIONS 


FIGURE  1.  Flow  chart  of  the  BS-VLC  algorithm.  The  following  steps  have  to  be 
considered: 

Compression-  (1)  Pre-processing;  (2)  Quantization;  (3)  VLC.  Decompression-  (!') 
VLD;  (2')  Inverse  quantization;  (3')  Post-processing. 

FIGURE  2.  Conformational  root  mean  square  deviations  (rms)  between  the  original 
trajectories,  obtained  by  molecular  dynamics  applied  to  (A)  trypsin  and  (B)  trypsinrPTI 
complex,  and  the  trajectories  obtained  by  sequential  BS-VLC  compression  and 
decompression  with  a  scaling  factor  of  10^  as  function  of  the  simulation  time  (t).  Three 
alternative  reference  coordinates,  atomic  coordinates  ( — ),  atomic  coordinates  within  a 
block  (...)  and  decompressed  atomic  coordinates  ( — ),  were  used. 

FIGURE  3.  Total  root  mean  square  deviations  ( rms )  between  the  original  trajectories, 
obtained  by  molecular  dynamics  applied  to  (A)  trypsin  and  (B)  trypsin:PTI  complex, 
and  the  trajectories  obtained  by  sequential  BS-VLC  compression  and  decompression 
with  28  different  scaling  factors  as  function  of  compression  efficiency. 

FIGURE  4.  Byte  length  (BL)  of  initial  trajectory  files,  obtained  by  molecular  dynamics 
applied  to  trypsin  and  trypsin:PTI  complex,  and  of  compressed  files  obtained  using 
different  algorithms.  The  compression  efficiencies  and  total  root  mean  square 
deviations,  between  the  original  trajectory  and  the  trajectories  obtained  by  sequential 
compression  and  decompression,  are  also  indicated  within  round  and  square  brackets, 
respectively. 
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(8.6%)  (8.6%) 


Table  I:  Minimum  number  of  byte  coding  for  different  intervals  of  integers  numbers. 

Minimum  no.  of  byte  coding 


Table  E.  Compression  efficiencies  and  total  root  mean  square  deviations  ( rms )  between  the  original  trajectories, 
obtained  by  molecular  dynamics  applied  to  trypsin  and  trypsin:BPTI  complex,  and  the  trajectories  obtained  by 
sequential  BS-VLC  compression  and  decompression  with  a  scaling  factor  (Scale). 


TrvDsin 

Trvpsin:BPTI 

Scale 

Compression 
efficiency  (%) 

rms  (A) 

Scale 

Compression 
efficiency  (%) 

rms  (A) 

l.OOOx  107 

24.7 

0.000 

l.OOOx  107 

24.7 

0.000 

9.400  X  106 

24.8 

6.000  X  10-9 

9.400  X  106 

24.9 

6.000  X  10-9 

8.850  X  106 

24.9 

7.000  X  10-9 

8.300  X  106 

25.0 

9.000  X  10-9 

8.300  X  106 

25.0 

7.100  X  10-9 

7.200  X  103 

25.1 

1.400  X  10-8 

7.200  X  106 

25.1 

l.OOOx  10-8 

4.108  X  103 

26.3 

3.664  X  10-7 

3.664  X  105 

27.3 

8.140  X  10-7 

3.664  X  103 

29.2 

6.090  X  10-7 

3.220  X  105 

30.2 

9.290  X  10-7 

3.220  X  103 

32.3 

7.790  X  10-7 

2.776  X  105 

■  32.3 

1.055  X  10-6 

2.776  X  103 

34.6 

9.050  X  10-7 

2.332  X  105 

33.5 

1.321  X  10-6 

2.332  X  103 

35.6 

1.055x10-6 

1.888  X  105 

34.7 

1.653  X  10-6 

1.888  X  103 

36.7 

1.285  X  10-6 

1.444  X  105 

38.1 

2.182  X  10-6 

1.444  X  103 

39.4 

1.685  X  10-6 

l.OOOx  105 

45.0 

2.923  X  10-6 

l.OOOx  103 

45.2 

3.197  X  10-6 

9.400  X  10^ 

45.7 

3.097  X  10-6 

9.400  X  104 

45.8 

3.421  X  10-6 

8.850  X  104 

46.1 

3.268  X  10-6 

8.850  X  104 

46.2 

3.651  X  10-6 

8.300  X  104 

46.4 

3.507  X  10-6 

8.300  X  104 

46.4 

3.884  X  10-6 

7.750  X  104 

46.6 

3.767  X  10-6 

7.750  X  104 

46.6 

4.130  X  10-6 

7.200  X  104 

46.7 

4.077  X  10-6  1 

7.200  X  104 

46.7 

4.399  X  10-6 

6.650  X  104 

46.8 

4.430  X  10-6 

6.650  X  104 

46.8 

4.699  X  10-6 

6.100  X  104 

46.9 

4.787  X  10-6 

6.100  X  104 

46.9 

5.021  X  10-6 

5.550  X  104 

47.0 

5.221  X  10-6 

5.550  X  104 

47.0 

5.399  X  10-6 

5.000  X  104 

48.2 

5.794  X  10-6 

5.000  X  104 

48.4 

5.849  X  10-6 

4.552x  104 

49.1 

6.392  X  10-6 

4.552  X  104 

49.3 

6.404  X  10-6 

4.108  X  104 

49.5 

7.057  X  10-6 

4.108  X  104 

49.6 

7.052  X  10-6 

3.664  X  104 

49.8 

7.889  X  10-6 

3.664  X  104 

49.8 

8.042  X  10-6 

3.220  X  104 

50.0 

9.005  X  10-6 

3.220  X  104 

50.0 

9.134  X  10-6 

l.OOOx  105 

57.9 

2.886  X  10-4 

l.OOOx  103 

60.1 

2.887  X  10-4 

5.000  X  102 

65.4 

5.773  X  10-4 

5.000  X  102 

66.4 

5.772  X  10-4 

l.OOOx  102 

75.0 

2.886  X  10-3 

l.OOOx  102 

75.0 

2.887  X  10-3 

ABSTRACT 


Molecular  dynamics  is  a  well-known  technique  very  much  used  in  the  study  of 
biomolecular  systems.  The  trajectory  files  produced  by  molecular  dynamics  simulations 
are  extensive  and  the  classical  lossless  algorithms  give  poor  efficiencies  in  their 
compression.  In  this  work,  a  new  specific  algorithm,  named  Byte  Structure  Variable 
Length  Coding  (BS-VLC),  is  introduced.  Trajectory  files,  obtained  by  molecular 
dynamics  applied  to  trypsin  and  trypsinrPTI  complex,  were  compressed  using  four 
classical  lossless  algorithms  (Huffman,  adaptive  Huffman,  LZW  and  LZ77)  as  well  as 
the  BS-VLC  algorithm.  The  results  obtained  show  that  BS-VLC  nearly  triplicates  the 
compression  efficiency  of  the  best  classical  lossless  algorithm,  preserving  a  near 
lossless  behavior.  Compression  efficiencies  close  to  50%  can  be  obtained  with  a  high 
degree  of  precision  and  the  maximum  efficiency  possible  (75%),  within  this  algorithm, 
can  be  performed  with  good  precision. 
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Abstract 

The  preparation  of  ozone/nitrogen  oxides  mixtures  in  air  containing  the  nitrate  radical, 
their  reaction  with  the  unsaturated  lipid  l-palmitoyl-2-oleoyl-5M-glycero-3-phosphocholine 
(POPC),  and  the  determination  of  the  reaction  products  in  comparison  to  those  obtained  from 
a  reaction  with  only  ozone  in  air  is  described  by  MALDI-FTMS.  The  results  indicate  the 
importance  of  nitrate  radical  in  ozone  toxicity. 

Introduction 

Ozone  is  the  most  abundant  oxidant  in  polluted  air  and  its  adverse  effects  on  human 
health  are  well  documented.’  The  lung  is  the  organ  that  is  most  affected  by  ozone.  Short  term 
exposure  to  high  levels  of  ozone  leads  to  acute  inflammatory  reactions  and  pulmonary  edema 
whereas  prolonged  exposure  to  lower  levels  produces  emphysema,  bronchopneumonia  and 
fibrosis.  Numerous  studies  have  established  the  ability  of  ozone  to  react  with  species  present 
in  the  lung;  they  include  the  amino  acids,  peptides  and  proteins.^’^  However,  the  primary 
target  of  ozone  is  thought  to  be  the  unsaturated  fatty  acids  (UFA)  in  the  fluid  layer  of  the  lung 
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lining  and  in  the  epithelial  cell  membranes  of  the  lung.'*’^  Because  of  its  reactivity,  ozone 
does  not  penetrate  far  into  the  cells  that  line  the  airways;  consequently,  many  pulmonary  and 
all  extrapulmonary  effects  of  ozone  must  be  caused  by  messenger  species.  Thus,  inhalation  of 
even  low  ozone  concentrations  can  cause  the  release  of  proinflammatory  mediators  in  the 
lung,  and  it  is  these  mediators  that  lead  to  the  inflammation  and  other  effects  associated  with 
ozone. 

The  cascade  hypothesis^  states  that  lipid  ozonation  products  (LOP)  relay  the  effects  of 
ozone  into  deeper  tissue  strata  at  the  lung-air  interface  than  ozone  itself  can  reach.  LOP,  rather 
than  products  from  ozonation  of  proteins  or  nucleic  acids,  are  thought  to  be  signal 
transduction  species  because  ozonation  of  UFA  leads  to  small,  diffusible,  stable  or  metastable 
species,  and  because  lipid  oxidation  products  are  known  to  act  as  signal  transduction  agents  in 
other  systems.  Thus,  the  likely  candidates  for  signal  transduction  species  are  LOP  produced  in 
the  Criegee  ozonation  process,  which  gives  a  predictable  spectrum  of  products.*'^*  Recent 
results  by  Friedman,  Pryor  and  coworkers  strongly  support  this  hypothesis. 

Nitrogen  oxides  (NO  and  NO2)  are  usually  present  in  a  mixture  with  other  air 
pollutants  in  real-life  exposures.  With  ozone  present,  all  nitric  oxide  is  converted  to 
nitrogen  dioxide.  Early  studies  indicated  that  the  responses  to  ozone  and  nitrogen  dioxide 
were  additive,  but  it  was  also  found  that  the  immediate  effects  in  rat  lungs  were  dissimilar 
with  respect  to  lipid  peroxidation,  lung  protein  or  nonprotein  sulfhydryl  levels.^®  Starting  in 
the  1980’s,  Pryor  and  coworkers^*'^^  investigated  the  oxidation  of  biological  molecules  by 
nitrogen  dioxide.  The  relatively  high  tolerance  for  both  long-term  and  short-term  exposure  to 
ambient  nitrogen  dioxide  made  it  unnecessary  to  include  NO2  chemistry  in  the  cascade 
hypothesis  of  ozone  toxicity.  However,  the  combined  action  of  ozone  and  nitrogen  dioxide 
must  take  account  of  the  fact  that  these  gases  rapidly  react  to  form  the  nitrate  radical,  a  very 
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potent  oxidant.  In  fact,  NO3  reacts  with  unsaturated  organic  molecules  more  than  a  thousand 
times  faster  than  does  ozone.^"* 

Here  we  report  the  preparation  of  ozone/nitrogen  oxides  mixtures  in  air  containing  the 
nitrate  radical.  The  reaction  of  such  mixtures  is  carried  out  with  l-palmitoyl-2-oleoyl-5«- 
glycero-3-phosphocholine  (POPC)  applied  on  a  surface  and  the  reaction  products  are 
determined  by  matrix-assisted  laser  desorption/ionization  Fourier  transform  mass 
spectrometry  (MALDI-FTMS).  The  results  are  compared  with  those  obtained  from  parallel 
reaction  when  only  ozone  in  air  was  used  to  react  with  POPC.  The  optimal  conditions  for 
determination  the  reaction  products  and  elucidation  of  the  synergistic  effects  of  ozone  and 
nitrate  radical  in  the  heterogeneous  reaction  with  lipids  are  determined. 

Experimental 


Reactions.  An  apparatus  (Fig.  2)  consisting  of  an  evacuable  glass  reaction  column  and 
mixing  chamber,  maintained  in  the  dark  because  of  the  ready  photodegradability  of  nitrate 
radical,  was  used  to  perform  the  reactions.  The  lipid,  POPC  (Sigma  Chemical  Co.,  St  Louis, 
MO),  dissolved  in  dichloromethane  and  spread  inside  the  reaction  column,  yielded  a  thin  film 
after  evaporation.  The  lipid  film  was  allowed  to  react  with  the  gas  mixture  from  the  mixing 
chamber  for  a  specified  time.  If  necessary  to  ensure  completion  of  the  reaction,  the  previously 
reacted  thin  layer  of  sample  was  redissolved  and,  using  the  same  procedure,  was  again 
exposed  to  the  same  gas  mixture.  The  gas  reaction  mixtures  were  prepared  by  mixing  streams 
of  known  concentrations  of  ozone  in  pure  oxygen  with  a  stream  of  pure  nitrogen,  which  did  or 
did  not  contain  a  known  concentration  of  nitric  oxide.  The  concentrations  within  the  gas 
streams  produced  air  samples  with  environmentally-relevant  concentrations  of  ozone  and 
nitrogen  dioxide  (appr.  100  ppb  of  each).  Thus,  in  conditions  of  excess  ozone,  fast  conversion 
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of  nitric  oxide  produced  O3/NO2  mixtures  which,  in  the  dark,  react  slower  to  give  nitrate 
radicals.  Either  ozone/air  or  (ozone  +  nitrogen  oxides)/air  mixtures  were  allowed  to  react  with 
the  lipid  films,  and  the  reaction  products  of  each  reaction  were  compared. 


Mass  spectra.  Mass  spectra  were  recorded  on  an  FTMS  2001  DD  spectrometer 
(Finnigan  FT/MS,  Madison,  WI,  USA)  equipped  with  a  3  T  superconducting  magnet,  and  a 
Nicolet  1280  data  station  using  a  pulsed  nitrogen  laser  (VSL-337ND-S,  Laser  Science,  Inc. 
Franklin,  MA,  USA)  at  337  nm  for  MALDI  experiments.  We  initially  used  2,5- 
dihydrobenzoic  acid  (DHB)  as  the  MALDI  matrix.  It  gave  abundant  positive  and  negative 
fragmentation  ions  for  the  POPC  and  its  ozonation  products  but  little  or  no  protonated  (m/z  = 
760),  sodiated  {m/z  =  782)  and  ozonated  {m/z  =  808)  molecular  ions.  To  avoid  fragmentation 
we  tested  unsuccessfully  several  other  matrices  (e.g.  /7-nitrobenzoic  acid  and  3-nitrophenol) 
and  sample  probe  cooling.  Only  3-nitrobenzyl  alcohol  as  a  matrix  with  a  cooled  sample  probe 
yielded  phospholipid  ions  and  mass  spectra  of  their  reaction  products  with  little  or  no 
fragmentation  and  formation  of  alkali  metal  adduct  peaks.  All  the  spectra  were  collected  after 
a  single  laser  shot. 

Calculations.  Because  the  rate  constants  for  all  the  respective  reactions  are  known 
(Table  1)  it  is  possible  to  calculate  development  of  NO3  radical  concentration  starting  with  O3 
+  NO  at  room  temperature  and  normal  pressure  in  air.  Calculation  results  for  the  first  1000  s 
of  50  ppb  NO  with  100,  150  and  200  ppb  of  ozone  show  final  concentrations  of  0. 1 1,  0.24  and 
0.60  ppb  for  NO3,  and  48,  96  and  141  ppb  for  ozone,  respectively.  In  these  calculations  the 
photoreaction  of  NO3  (dark  conditions)  and  decomposition  of  ozone  (amounting  less  than  10 
%  for  the  investigated  time  period  as  independently  determined)  were  omitted.  The  results  are 
shown  in  Fig.  2a-c. 
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The  ratio  of  O3/NO3  in  our  reaction  conditions  is  expected  to  be  ~  500  which  gives  the 
nitrate  radical  a  more  than  twentyfold  advantage  over  ozone  in  reaction  with  POPC.  Although 
N2O5  is  present  in  higher  concentration  than  is  NO3,  it  is  of  only  minor  importance  because  it 
reacts  with  water  yielding  HNO3  which  will  remain  bound. 

Results  and  Discussion 

Compared  to  prior  ionization  techniques,  phospholipid  analysis  with  MALDI-FTMS 
provides  much  higher  mass  resolving  power  and  sensitivity. The  only  studies  of  lipid 
ozonation  products  but  not  with  NO3  present  were  done  by  Finlayson-Pitts  and  coworkers^*’^^ 
using  fast  atom  bombardment  (FAB)  and  the  investigation  of  phospholipids  by  Marshall  and 
coworkers'^®  using  MALDI-FTMS.  The  latter  study  is  veiy  useful  for  the  present  investigation 
because  it  provides  an  experimental  fragmentation  scheme  for  POPC  and  introduces  the  use 
of  cooled  matrices  which  is  crucial  for  the  study  of  reaction  products  with  ozone  and  nitrate 
radical. 

Using  a  3-nitrobenzyl  alcohol  matrix  (~  3000:1  matrix-to-analyte  ratio)  we  observe 
negligible  fragmentation  of  the  lipid  and  a  weaker  sodiated  molecular  ion  which  sometimes  is 
not  observed.  The  mass  spectra  of  POPC  and  its  reaction  products  either  with  ozone  or  with 
ozone  containing  NO3  obtained  using  a  3-nitrobenzyl  alcohol  matrix  are  shown  in  Fig.  3a-c. 
The  reaction  of  POPC  with  ozone  is  expected  to  proceed  via  the  unstable  primary  ozonide 
which  fragments  to  the  secondary  (Criegee)  ozonide  via  zwitterionic  species  (Scheme  1).  The 
Criegee  ozonide  decomposes  yielding  an  acid  and  aldehyde  pair;  i.e.,  either  PC/AC  + 
ALD/C9  or  PC/ALD  +  AC/C9,  respectively  (see  Scheme  1  for  definitions  and  structure).  The 
MALDI  mass  spectrum  of  the  products  of  incomplete  ozonation  reaction  of  POPC  (Fig.  3b), 
which  is  similar  to  that  obtained  by  using  FAB^^,  confirms  the  products  predicted  by 
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Scheme  1  indicating  a  slightly  higher  probability  for  AC/C9  than  ALD/C9  formation  (i.e. 
higher  PC/ALD  than  PC/ AC  peak). 

The  comparison  of  the  products  after  extensive  reaction  of  POPC  (Fig.  3c)  with  ozone 
(LHS)  and  the  ozone/nitrogen  oxides  mixture  (RHS)  shows  the  products  are  similar  but  with  a 
dramatic  change  in  their  yields.  In  the  mass  spectrum  of  reaction  products  with  ozone,  a  new 
ion  of  m/z  =  638  is  observed  which  could  correspond  to  loss  of  O2  from  some  peroxide  t3T3e 
product  structure  (X-O2)  whereas  in  the  reaction  products  including  nitrate  radical  an 
unknown  ion  of  m/z  =  623  appears  which  could  correspond  to  loss  of  NO2  from  a  similar 
nitrite  structure  (X-ONO).  The  formation  of  new  products  with  (O3  +  NO*)  in  dark  is  not 
surprising  since  NO3  exhibits  much  higher  reactivity  with  unsaturated  organics  than  ozone.^"* 

These  results  indicate  the  importance  of  using  realistic  and  more  complex  mixtures  of 
oxidants  in  study  of  ambient  ozone  toxicity.  Clearly,  further  study  of  these  complex  mixtures 
will  be  rewarding. 

Acknowledgement 


This  work  was  financed  by  the  Ministry  of  Science  of  Croatia  and  submitted  for  support  to  the  Fogarty 
International  Research  Collaboration  Award.  (Parent  Grant  #1R01  ES08663-01  (Friedman,  Pryor)  NIH/NEEHS) 


6 


References 


(1)  Lippman,  M.  Health  effects  of  ozone.  J.  AirPollut.  Contr.  Assoc.  1989,  39,  672-695. 

(2)  Pryor,  W.  A.;  Dooley,  D.  F.  and  Church,  D.  F.  Mechanisms  for  the  reaction  of  ozone  with 
biological  molecules:  The  source  of  the  toxic  effects  of  ozone.  In;  Advances  in  Modern 
Environmental  Toxicology,  Volume  V.  The  Biomedical  Effects  of  Ozone  and  Related 
Photochomical  Oxidants  (Lee,  S.  D.,  Mustafa,  M.  G.  and  Mehlman,  M.  A.  eds.)  Princeton 
Scientific  Publishers,  Princeton,  NJ,  1982,  pp  7-19. 

(3)  Uppu,  R.  M.  and  Pryor,  W.  A.  The  reactions  of  ozone  with  proteins  and  unsaturated  fatty 
acids  in  reverse  micelles.  Chem.  Res.  Toxicol.  1994,  7,  47-55. 

(4)  Postlethwait,E.M.;  Cueto,  R.;  Velsor,  L.W.  and  Pryor,  W.A.  Os-induced  formation  of 
bioactive  lipids;  estimated  surface  concentrations  and  lining  layer  effects.  Am.  J.  Physiol 
1998,  274  (Lung  Cell  Mol  Physiol  18),  L1006-L1016. 

(5)  Kafoury,  R.M.;  Pryor,  W.A;  Squadrito,  G.L.;  Salgo,  M.G.;  Zou,  X.  and  Friedman,  M. 
Lipid  ozonation  products  activate  phospholipase  A2,  C,  and  D.  Toxicol.  Appl.  Pharmacol 
1998,  150,  338-349. 

(6)  Leikauf,  G.  D.;  Zhao,  Q.;  Zhou,  S.  and  Santrock,  J.  Ozonolysis  products  of  membrane 
fatty  acids  activate  eicosanoid  metabolism  in  human  airway  epithelial  cells.  Am.  J.  Respir. 

Cell  Mol  Biol  1993,  9,  594-602. 

(7)  Pryor,  W.  A.;  Squadrito,  G.  L.  and  Friedman,  M.  The  cascade  mechanism  to  explain 
ozone  toxicity:  the  role  of  lipid  ozonation  products.  Free  Radic.  Biol  Med.  1995,  19,  935- 
941. 


7 


(8)  Pryor,  W.  A.  and  Wu,  M.  Ozonation  of  methyl  oleate  in  hexane,  in  a  thin  film,  in  SDS 
micelles,  and  in  distearoylphosphatidylcholine  liposomes:  yields  and  properties  of  the  Criegee 
ozonide.  Chem.  Res.  Toxicol.  1992,  5,  505-511. 

(9)  Squadrito,  G.  L.;  Uppu,  R.  M.;  Cueto,  R.  and  Pryor,  W.  A.()  Production  of  the  Criegee 
ozonide  during  the  ozonation  of  l-palmitoyI-2-oleoyl-syn-glycero-3-phosphocholine 
liposomes.  Lipids  1992,  27,  955-958. 

(10)  Criegee,  R.  In:  Peroxide  Reaction  Mechanisms  Edwards,  J.  D.;  Ed.;  Wiley-Interscience: 
New  York,  1962 

(1 1)  Bailey,  P.  S.  Ozonation  in  Organic  Chemistry,  Vol  1,  Academic  Press:  New  York,  1978 

(12)  Pryor,  W.  A.;  Das,  B.  and  Church,  D.  F.  The  ozonation  of  unsaturated  fatty  acids: 
aldehydes  and  hydrogen  peroxide  as  products  and  possible  mediators  of  ozone  toxicity.  Chem. 
Res.  Toxicol.  1991,  4,  341-348. 

(13)  Cueto,  R.;  Squadrito,  G.  L.  and  Pryor,  W.  A.  Quantifying  aldehydes  and  distinguishing 
aldehydic  product  profiles  from  autoxidation  of  unsaturated  fatty  acids.  Methods  Enzymol. 
1994,  233,  174-182. 

(14)  Pryor,  W.  A.  and  Church,  D.  F.  The  reaction  of  ozone  with  unsaturated  fatty  acids: 
aldehydes  and  hydrogen  peroxide  as  mediators  of  ozone  toxicity.  In:  Oxidative  Damage  & 

Repair:  Chemical,  Biological  and  Medical  Aspects,  pp  496-504,  Pergamon  Press,  New  York, 
1991 

(15)  Church,  D.  F.;  McAdams,  M.  and  Pryor,  W.  A.  Free  radical  production  from  the 
ozonation  of  simple  alkenes,  fatty  acid  emulsions  and  phosphatidylcholine  liposomes.  In: 
Oxidative  Damage  &  Repair:  Chemical,  Biological  and  Medical  Aspects,  Pergamon  Press, 
New  York,  1991,  pp  517-522. 


8 


(16)  P.yor.  W.  A.;  Stanley.  J.  P.;  Blair,  E.  and  Cullen.  G.  B.  Autoxidation  of  polyunsaturated 
fatty  acids.  Part  I.  Effect  of  ozone  on  the  autoxidation  of  neat  methyl  hnoleate  and  methyl 
linolenate.  Arch.  Environ.  Health  1976,  31,  201-210. 

(17)  Pryor,  W.  A.;  Stanley,  J.  P.  and  Blair,  E.  Autoxidation  of  polyunsaturated  fatty  acids. 

Part  II.  A  suggested  mechanism  for  the  formation  of  TBA-reactive  matenals  from 
prostaglandin-like  endoperoxides.  Lipids  1976,  11,  370-379. 

(18)  Penkett,  S.  A.;  Blake,  N.  J.;  Lightman,  P.;  Marsh,  A.  R.  W.;  Anwyl,  P.  and  Butcher,  G. 

The  seasonal  variation  of  nonmethane  hydrocarbons  in  the  free  troposphere  over  the  north 
Atlantic  ocean;  Possible  evidence  for  extensive  reaction  of  hydrocarbons  with  the  mtrate 
radical.  J.  Geophys.  Res.  1993,  98,  2865-2885. 

(19)  Aschmann.  S.  M.  and  Atkinson  R.  Rate  constants  for  the  reactions  of  the  NO3  radical 
with  alkanes  at  296  ±  2  K.  Atmos.  Environ.  1995,  29,  2311-2316. 

(20)  Lindvall,  T.  Health  effects  of  nitrogen  dioxide  and  oxidants.  J.  Work  Environ.  Health 
1985,  11  (Suppl.  3),  10-28. 

(21)  Pryor,  W.  A.;  Frier,  D.  G.;  Lightsey,  J.  W.  and  Church,  D.  F.  Initiation  of  the 
autoxidation  of  polyunsaturated  fatty  acids  (PUFA)  by  ozone  and  nitrogen  dioxide.  In: 
Autoxidation  in  Food  and  Biological  Systems  (Simic,  M.  G.  and  Karel,  M.  eds.).  Plenum 
Publishing  Corp.,  New  York,  1980,  pp  1-16. 

(22)  Pryor,  W.  A.;  Lightsey,  J.  W.  Mechanisms  of  nitrogen  dioxide  reactions:  Initiation  of 
lipid  peroxidation  and  the  production.  Science  1981,  214,  435-437. 

(23)  Gallon,  A.  A.  and  Pryor,  W.  A.  The  reaction  of  low  levels  of  nitrogen  dioxide  with 
methyl  Hnoleate  in  the  presence  and  absence  of  oxygen.  Lipids  1993,  29,  171-176. 

(24)  Atkinson,  R.;  Arey,  J.;  Aschmann,  S.  M.;  Corchnoy,  S.  B.  and  Shu,  Y.  Rate  constants  for 
the  gas-phase  reactions  of  cis-3-hexen-l-ol,  cis-3-hexenylacetate,  trans-2-hexanal,  and 


9 


linalool  with  OH  and  NO3  radicals  and  O3  at  296  ±  2  K,  and  OH  radical  formation  yields  from 
the  O3  reactions.  Int.  J  Chem.  Kinet.  1995,  27,  941-955 

(25)  Schurath,  U.,  personal  communication. 

(26)  Klein,  R.  A.  Mass  spectrometry  of  the  phosphatidylcholines:  dipalmitoyl,  dioleoyl,  and 

stearoyl-oleoyl  glyceryl  phosphorylcholines.  y.  1971,  12,  123-131. 

(27)  Klein,  R.  A.  Mass  spectrometry  of  the  phosphatidylcholines.  Fragmentation  processes  for 
dioleoyl  and  stearoyl-oleoyl  glyceryl-phosphorylcholine.  J.  Lipid  Res.  1971,  12,  628-634. 

(28)  Wood,  G.  W.  and  Lau,  P.  Y.  Analysis  of  intact  phospholipids  by  field  desorption  mass 

Biomed.  Mass  Spectrom.  1974,  1,  154-155. 

(29)  Wood,  G.  W.;  Lau,  P.  Y.  and  Rao,  G.  N.  S.  Q  Field  desorption  mass  spectrometry  of 
phospholipids:  Fragmentation  of  dipalmitoylphosphatidylcholine  from  comparision  of  dO,  d4 
and  d9  species.  Biomed.  Mass  Spectrom.  1976,  3,  172-176. 

(30)  Wood,  G.  W.;  Lau,  P.  Y.;  Morrow,  G.;  Rao,  G.  N.  S.  and  Schmidt,  D.  E.  Field  desorption 
mass  spectrometry  of  phospholipids:  Survey  and  structural  types.  J.  Chem.  Phys.  Lipids  1977, 
18,316-333. 

(3 1)  Crawford,  C.  G.  and  Plattne,  R.  D.  Ammonia  chemical  ionization  mass  spectrometry  of 
intact  diacyl  phosphatidylcholine.  J.  Lipid  Res.  1983,  24,  456-460. 

(32)  Bisseret,  P.;  Nakatani,  Y.;  Ourisson,  G.;  Hueber,  R.  and  Teller,  G.  Ammonia  chemical 
ionization  mass  spectrometry  of  lecithins  on  a  gold  support/.  Chem.  Phys.  Lipids  1983,  33, 
383-392. 

(33)  Demirev,  P.  A.  Californium-252  plasma  desorption  mass  spectrometry  of 

Biomed.  Mass  Spectrom.  1987,  14,  241-246. 


10 


1 


(34)  Cotter,  J.  R.  and  Tabet,  J.  C.  Laser  desorption  mass  spectrometry;  mechanisms  and 
applications.  Int.  J.  Mass  Spectrom.  lonPhys.  1983,  53,  151-166. 

(35)  Fenwick,  G.  R.;  Eagles,  J.  and  Self,  R.  Mass-analyzed  ion  kinetic  energy  spectra  and 
B1E-B2  triple  sector  mass  spectrometric  analysis  of  phosphoinositides  by  fast  atom 
bombardment.  Biomed.  Mass  Spectrom.  1983,  10,  382-386. 

(36)  Jensen,  N.  J.;  Tomer,  K.  B.  and  Gross,  M.  L.  Fast  atom  bombardment  and  tandem  mass 
spectrometry  of  phosphatidylserine  and  phosphatidylcholine.  Lipids  1986,  21,  580-588 

(37)  Jensen,  N.  J.;  Tomer,  K.  B.  and  Gross,  M.  L.  FAB  MS/MS  for  phosphatidylinositol,  - 
glycerol,  -etanolamine  and  other  complex  phospholypids.  Lipids  1987,  22,  480-489 

(38)  Lai,  C.  C.;  Finlayson-Pitts,  B.  J.  and  Willis,  W.  A.  Formation  of  secondary  ozonides 
from  the  reaction  of  an  unsaturated  phosphatidylcholine  with  ozone.  Chem.  Res.  Toxicol. 

1990,  3,  517-523. 

(39)  Finlayson-Pitts,  B.  J.;  Pham,  T.  T.  H.;  Lai,  C.  C.;  Johnson,  S.  N.;  Luciogough,  L.  L.; 
Mestats,  J.  and  Iwig  D.  Thermal  decomposition  of  phospholipid  secondary  ozonides  - 
implications  for  the  toxicity  of  inhaled  ozone.  Inhalation  Toxicology.  1998,  10,  813-830. 

(40)  Marto,  J.  A.;  White,  F.  M.;  Seldomridge,  S.  and  Marshall,  A.  G.  Structural 
characterization  of  phospholipids  by  matrix-assisted  laser  desorption/ionization  Fourier 
transform  ion  cyclotron  resonance  mass  spectrometry.  Anal.  Chem.  1995,  67,  3979-3984. 


11 


Table  1.  Reactions  and  their  rate  constants  used  in  the  simulation  of  product  formation  from 
ozone/nitric  oxide  mixtures  in  air. 


Reaction 

Rate  constants* 

NO  +  O3  — >  NO2  +  O2 

1.8  X  10*^"^  cm^molecule'^s*^ 

N0+N03->2N02 

2.6  X  10'“  cm^molecule'^s’^ 

NO2  +  O3  — >  NO3  +  O2 

3.2  X  10’^^  cm^molecule'^s'^ 

N02  +  N03+M->N205 

2.0  X  10'^^  cm^molecule'^s'^ 

N2O5  +  M  NO2  +NO3  +  M 

6.9  X  10'^  s’^ 

N03+N03-^2N02  +  02 

8.5  X  10’^^  cm^molecule‘*s'^ 

*  from  CRC  Handbook  of  Chemistry  and  Physics,  79*  Edition,  David  R.  Lide,  ed.,  CRC 
Press,  Boca  Raton,  1998,  p.  5-105. 


12 


o 


CH20.C.(CH2)i4.CH3 

o 


CHO.  C .  (CH2)7CH=CH(CH2)7CH3 

I  O 

I  II 

CH2O.  P.0.(CH2)2.N^(CH3)3 
O' 


o. 


CH2O.  C .  (same)  q 

I  o  o'' 

I  II  A  / 

CHO.  C  .  (CH2)7HC-CH  .  (CH2)7CH3 


PRIMARY 

OZONIDE 


CH2O.  (same) 


O 

II 

CH20.C.(CH2)i4.CH3 

I  o 


.H 


O 

II 

CH20.C.(CH2)i4.CH3 

0 


OCH 


/ 

+  'OOC^ 

(CH2)7CH3 

CH2O.  P.  O  .  (CH2)2.N+(CH3)3 


CHO.C.(CH2)7.CHO 

I  0 


CHO.  C  .  (CH2)7HC'".  OO-  +  (CH2)7 

I  0 


CH3 


CH2O.  P.0.(CH2)2.N1CH3)3 


O' 


o 

II 

CHjO.  c .  (same) 

I  0 


0-^0 
/  ^  \ 


o 

II 

CH20.C.(CH,),4.CH3 

I  O 
I  II 

CHO.  C .  (CH2)7.CH0 

i  O 


CHO.  C  .  (CH2)7HC.^  i^CH(CH2)7CH3 
CH2O.  (same) 


CH20.C.(CH2)i4-CH3 

I  O 
I  II 

CHO.  C .  (CH2)7C00H 

o 


COOH 

I 

+  (CH2)7 

CH3 


CRIEGEE 

(SECONDARY) 

OZONIDE 


OCH 

1 

+  (CH2)7 

CH3 


CH2O.  P.0.(CH2)2-NXCH3)3 
O' 

PC/ALD  AC/C9 


CH2O.  P.  O  .  (CH2)2.NXCH3)3 
O' 


PC/AC  ALD/C9 

Scheme  1.  Product  formation  in  the  reaction  of  POPC  with  ozone  according  to  the  Criegee 


ozonation  process. 
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Figure  1.  Apparatus  for  carying  out  the  reaction  of  POPC  with  ozone  and  ozone/nitric  oxide 
mixtures  in  air. 
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Figure  2.  Simulation  of  first  thousand  second  period  of  product  development  from  a  mixture 
of  50  ppb  of  NO  with 

a)  100  ppb 

b)  150  ppb  and 

c)  200  ppb  of  ozone  in  air  using  rate  constants  from  Table  1. 
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Figure  3:  MALDI FTMS  positive  ion  spectra  from  single  nitrogen  laser  shots  on  cryogenic 
matrices  of  3-nitroben2yl  alcohol  in  3000:1  ratio  with  the  following  analytes: 

a)  pure  POPC 

b)  POPC  after  10  min  exposure  to  ozone  and 

c)  reaction  products  of  POPC  with  ozone  (LHS)  and  ozone/nitrogen  oxides  mixture  (RHS). 


17 


UNIVERSAL  METRIC  PROPERTIES  OF  THE  GENETIC  CODE 


Nikola  Stambuk 


Rudjer  Boskovic  Institute,  Bijenicka  54,  HR- 10001  Zagreb,  Croatia 


Correspondence  to;  Dr.  Nikola  Stambuk,  Subiceva  16,  HR-10000  Zagreb,  Croatia, 
e-mail;  stambuk@rudjer.irb.hr 


2 


ABSTRACT 

Universal  metric  properties  of  the  genetic  code  (i.e.  RNA,  DNA  and  protein  coding)  are  defined  by 
means  of  the  nucleotide  base  representation  on  the  square  with  vertices  UorT  =  00,  C  =  01,G=10 
and  A  =  1 1 .  It  is  shown  that  this  notation  defines  Cantor  set  and  Smale  horseshoe  map  representation 
of  the  genetic  code,  classic  table  arrangement  and  Siemion  one-step  mutation  ring  of  the  code.  Gray 
code  solution  to  the  problem  with  all  codon  positions,  and  an  extention  to  octal  coding  system  are 
given.  Finally,  unified  concept  of  the  genetic  code  linked  to  the  Cantor  set  and  horseshoe  map  is 
introduced  in  the  form  of  a  classic  combinatorial  4  colour  necklace  model  with  three  horizontal 
frames  of  64  coloured  pearls  (bases)  and  vertically  hanging  decorations  of  triplets  (codons).  Three 
horizontal  necklace  frames  define  Crick's  code  without  comma,  and  vertical  necklace  decorations 
define  the  evolutional  code.  Thus,  the  type  of  the  code  depends  on  the  level  or  direction  of  the 
observation.  Fibonacci  dynamics  and  Cantor  set-Farey  tree  partition  of  codon  and  amino  acid  groups 
are  discussed  and  explained.  This  method  of  genetic  code  analysis  is  named  SCA  procedure. 
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INTRODUCTION 

The  protein  coding  and  synthesis  in  biological  systems  is,  together  with  all  other  information  of  the 
genome,  found  in  DNA  and  RNA  strings  consisting  of  4  nucleotide  base  combinations  (U  or  T,  C,  A 
and  The  four  bases  define  64  codon  triplets  that  specify  20  amino  acids  and  3  stop  codons  for 
the  protein  synthesis.^'^  The  aim  of  this  paper  is  to  define  the  universal  metric  properties  of  the  codon 
and  nucleotide  base  recombination.  This  will  be  done  by  addressing  three  dimensions  of  the  problem, 

as  follows. 

First,  we  show  that  the  quadratic  binary  representation  of  the  4  bases  on  the  unit  square  may  be 
projected  on  the  Cantor  set  for  all  codons  and  amino  acids.  It  is  proved  that  for  the  one-dimensional 
projection  symbolic  binary  coordinates  provide  the  Gray  code  solution  to  the  problem  of  ammo  acid 
coding.  Counter-clockwise  and  clockwise  changes  of  the  base  positions  on  the  square  define  the  link 
of  the  classic  genetic  code  table  and  Siemion  one-step  mutation  ring  of  the  genetic  code  (which  is 
linked  to  the  physico-chemical  properties  of  the  amino  acids). 

Second,  we  show  that  Smale  horseshoe  map  representation  of  binary  and  Cantor  codon  (ammo  acid) 
positions  defines  classic  table  of  the  genetic  code.  This  result  shows  that  the  genetic  code  table  is  a 
reflection  of  the  standard  horseshoe  map,  often  used  in  the  analysis  of  nonlinear  and  chaotic  systems. 
The  possibility  of  analysis  by  binary  and  octal  coding  system  is  discussed  and  octal  code  addresses  for 

codons  and  amino  acids  are  given. 

Third,  we  show  that  a  classic  combinatorial  4  colour  necklace  problem\  with  each  colour 
representing  a  nucleotide  base  projection  on  the  unit  square,  defines  the  unified  concept  of  the 
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genetic  code.  Three  horizontal  frames  of  the  necklace,  consisting  of  64  coloured  pearls  (bases), 
make  Crick's  comma-less  code  and  vertically  hanging  decoration  triplets  (codons)  define  evolutional 
code.  Thus,  it  is  proved  that  the  necklace  model  defines  both  concepts,  depending  on  the  level  of 
observation  and/or  position  of  the  observer. 

Finally,  the  process  of  base,  codon  and  amino  acid  recombination  in  the  genetic  code  table  is 
discussed  with  respect  to  their  Cantor  set  and  Farey  tree  partition  fi-equencies.  It  is  emphasised  that 
this  interaction  leads  to  the  Fibonacci,  i.e.  golden  ratio,  based  dynamics  in  selecting  codon  and  amino 
acid  families,  as  previously  discussed  by  Schroeder  and  Stambuk.*'®  This  method  of  genetic  code 
notation  and  analysis  is  named  SCA  procedure. 


RESULTS  AND  DISCUSSION 

Primary  coding  problem  -  metric  on  the  unit  interval 

The  notation 

We  introduce  the  binary  representation  of  4  nucleotide  bases  on  the  square  with  vertices  00,  01,  10, 
11  in  a  manner  defined  for  Cantor  set  by  H.  Steinhaus  in  1917  (when  discussing  interesting 
properties  of  the  set  noticed  by  S.  Banach).^  The  notation  U  or  T  =  00,  C  =  01,  G— 10  and  A  — 
1 1  is  presented  in  Figure  1 .  It  has  the  following  properties: 

The  combination  of  2  digits  (0  or  1),  denoting  primary  and  secondary  characteristics  of  the 
nucleotide  base  describe  each  of  the  letters  according  to  the  group  subdivision/discrimination 
principles  (1st  digit  purine-pyrimidine,  2nd  digit  strong-weak  H  bond  discrimination).  The  first  and 
weak  H  bonding  pyrimidine  base  U  or  T  =  00  is  discriminated  from  the  next  strong  H  bonding 
pyrimidine  base  C  =  01  by  the  second  digit  notation.  Full  complementarity  in  obtaining  weak  (A) 
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and  strong  H  bonding  purines  (G)  is  achieved  by  0  ol  pyrimidine  changes  (to  A  -  1 1,  G  -  10),  or 
vice  versa. 

Codon  positions  on  the  binary  tree 

Table  1  shows  the  binary  notation  for  all  64  codons  and  20  amino  acids.  To  define  more  precisely 
the  positions  of  particular  codon  intervals  of  the  binary  tree  with  respect  to  the  quadratic  base 
mapping  we  examine  invariant  Cantor  set  with  the  method  of  symbolic  dynamics  in  a  standard 
manner.^;*  This  was  performed  since  Cantor  set  possesses  two  properties  related  to  the  binary  coding 

of  the  Figure  1  notation:^’* 

1.  binary  decomposition  of  the  initial  segment  into  2"  segments  projected  on  (n-1)*  binary  tree  level, 

2.  partitioning  of  the  observed  set  by  excluding  1/3  of  its  original  length  per  each  of  the  tree  levels. 

The  relative  location  of  different  coding  intervals  and  their  orientation  is  additionally  specified  in 

Table  1,  by  the  nodes  of  alternating  binary  tree  and  their  symbolic  coordinates  (names).  Briefly,  the 

left  half  of  the  unit  interval  is  labelled  0  and  the  right  one  1.  For  x<l/2  and  its  derivative  fx(x)>0, 

with  f(x)  =  Xx(l-x),  X>4,  the  pairs  of  the  initial  binary  tree  preserve  orientation  and  for  x>l/2, 

8 

f  x(x)<0  they  reverse  orientation  in  the  alternating  binary  tree. 

Gray  code  solution  to  the  metric  problem 

Symbolic  coordinates  of  codon  and  amino  acid  locations  on  the  Cantor  set  m  Table  1  represent  the 
Gray  code  solution  to  the  n=6  digit  binary  notation  for  2"=  64  codons.  This  result  has  been  pubhshed 
by  M.  Gardner  in  1972."  Gardner's  Gray  numbers  that  solve  the  puzzle  for  n=6  digits/rings  are 
symbolic  addresses  of  different  codons  in  Table  1,  and  Cantor  set  solution  to  this  problem  represents 
their  projection  to  [0,  1]  interval  according  to  their  appearance.  Consequently,  the  stretching  and 
folding  of  the  quadratic  map  with  symbolic  dynamics  on  the  unit  interval,*  keeps  track  and 
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information  of  the  hypercube  codon  (amino  acid)  representations  by  means  of  the  Gray  code.  Two 
dimensional  representation  is  defined  via  horseshoe  map. 

The  unit  interval  Cantor  mapping  in  Table  1  solves  complementary  coding  problem  via  binary  tree 
codon  projection,  since  Gray  code  solution  requires  at  least  32  binary  numbers  from  the  first  part  of 
the  table.  Complementary  addresses  for  the  second  half  of  the  table  are  symmetrically  arranged  at 
opposite  Cantor  positions  and  obtained  by  0<->^l  digit  switch. 

Siemion's  mutation  ring  and  genetic  code  table 

Table  2  shows  that  permutations  of  the  4  amino  acid  families  in  classic  genetic  code  Table  3  (CUGA, 
UGAC,  GACU  and  ACUG)  identify  Siemion’s  one-step  mutation  ring  of  the  genetic  code*®  presented 
in  Table  4.  This  is  done  by  means  of  UC/CU,  AG/GA  replacements  (U  row/column),  C/G  -  G/C 
mutation  (C  row/column),  A/U  mutation  (A  row/column)  and  C/G  mutation  (G  row/column).  Four 
amino  acid  families  are  defined  by  a  simple  algorithm  of  the  binary  codon  notation  (Figure  1). 

The  extraction  of  the  Siemion’s  mutation  ring  fi-om  the  standard  genetic  code  table,  by  means  of  the 
Cantor  set  based  nucleotide  notation  (and  algorithm)  in  Figure  1 ,  is  of  considerable  importance  since 
Siemion’s  ring  is  related  to  the  physico-chemical  properties  of  different  amino  acids.^'*® 

Secondary  coding  problem  -  Horseshoe  map  metric  and  octal  coding 

Smale's  Horseshoe  Map 

The  Smale  horseshoe  map  is  the  example  of  a  chaotic  hyperbolic  invariant  set  and  the  map  often 
behaves  like  skeleton  on  which  chaotic  and  periodic  orbits  of  the  system  are  organized.*’**  The 
horseshoe  is  a  mapping  of  the  unit  square  (Figure  1)  which  contracts  the  horizontal  directions, 
expands  in  the  vertical  direction,  and  then  folds.  The  mapping  is  only  defined  on  the  unit  square  and 
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points  that  leave  the  square  are  ignored.*  Forward  and  backward  iteration  of  the  horseshoe  map 
generate  the  locations  of  the  periodic  points.*’” 

Amino  acid  and  codon  horseshoe  mapping 

By  iterating  the  map  we  specified  the  locations  of  a  periodic  orbits  within  the  homoclimc  tangle  of 
the  horseshoe.  Table  5  gives  the  labelling  scheme  for  horizontal  and  vertical  branches  from  a  pair  of 
alternating  binary  trees.  The  projections  of  2  binary  triplets  (or  2  octal  numbers)  according  to  the 
horseshoe  pattern  extract  standard  table  of  the  genetic  code  (Table  3),  which  proves  that  this  map 
defines  the  patterns  of  the  codon  recombination  buried  in  the  code.  Patterns  of  the  first,  second  and 
third  base  changes  also  satisfy  and  confirm  standard  square  notation  with  4  binary  addresses 
presented  in  Figure  1,  typical  of  the  horseshoe  map.  The  algorithm  in  Figure  1  is  therefore  confirmed 
for  the  genetic  code  and  Table  5  represent  its  proper  labelling  scheme. 

Since  the  invariant  horseshoe  set  is  a  product  of  two  Cantor  sets  intersections  in  horizontal  and  in 
vertical  directions,*  the  Cantor  set  projection  of  the  genetic  code  is  also  proved  for  a  two- 
dimensional  case. 

Octal  coding 

Further  extension  of  the  coding  system  in  Table  6  defines  28  pairs  of  all  possible  8  (node)  3- 
dimensional  cube  permutations  (i.e.  as  8  x  8  codon  octades).^’”  Three  binary  digits  define  octal 
number  coding,  and  28  =  8!/2!  (8-2)!  pairing  combinations  are  obtained  from  the  permutations  of  2 
corresponding  octal  numbers”  (or  2  binary  triplets).  Eight  identical  doublets  in  addition  to  56 
different  dublets  define  all  64  codons.  This  pattern  is  consistent  with  three  letter  alphabet  permutation 
consisting  of  two  binary,  i.e.  0  or  1,  choices  in  the  truth  table  (2^  ==  S)>^ 
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Tertiary  coding  problem  -  Necklace  model  of  the  genetic  code 

Circular  code  arithmetic  and  Necklace  coding 

The  genetic  and  protein  circular  code  is  defined  by  means  of  a  combinatorial  necklace  model.'*  This 
structure  consists  of  64  beads  of  4  different  colours  representing  4  nucleotide  bases  (U  or  T,  C,  A, 
G).  The  coloured  beds  are  making  decorations  that  consists  of  vertically  hanging  chains  of  x  =  3 
beds,  which  represent  each  of  the  codons.  Consequently,  there  are  >'  =  4^  distinct  vertical  chains 
that  can  be  made  (i.e.  number  of  words  of  length  x  =  5  with  alphabet  of  size  y  =  4).  The  total 
number  of  possible  vertical  decorations  containing  at  least  two  colours  each  is  -  y  ~  60,  and  y  = 

4  decorations  contain  the  beads  of  the  same  colour. 

One  of  the  characteristics  of  this  system  is  that  we  may  define  "beheading"  as  the  process  where  the 
top  bead  is  taken  and  replaced  on  the  bottom.'*  After  some  repetitions  we  observe  the  initial  pattern. 
Let  b  be  the  smallest  positive  number  of  successive  behadings  (including  reverse  ones)  needed  to 
get  back  the  original,  we  have: 

1  <  b  <  3,  X  =  ah  +  c  (0  <  c  <b).  (1) 

The  initial  pattern  is  restored  by  x  beheadings  followed  by  a  lots  of  b  reverse  behadings.  For  c  = 
0,  x  =  ab,  if  X  is  prime  and  b  >  I  we  have  6  =  x,  a  =  /.  By  observing  the  chains  and  their  first 
X  -  1  beheadings  different  collections  are  made  (that  cannot  be  transformed  into  each  other).  Thus, 
when  -  y  chains  have  been  accounted,  we  get  a  total  of  n  collections  -  y  —  rtx  and 
y'=  y  mod  X,  from  which  we  obtain  Fermat's  theorem.”* 

Coding  patterns  and  codon  collections 

Table  7.a-f  presents  the  circular  and  complementary  coding  patterns  for  all  possible  codon 
collections.  It  contains  two  and  three  colouring  collections  consisting  of  3  transformed/beheaded 
codons  (12  and  8  collections  of  3  triplets,  i.e.  60,  of  2  and  3  colours  respectively)  and  four  triplets  of 
the  same  colour.  It  is  shown  that  there  exists  the  codon  arrangement  for  each  of  3  horizontal 
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necklace  frames  (m  =  mod  1,  2,  3)  that  is  100%  identical  to  the  empirically  detected  one  by  Arques 
and  Michel,  Four  triplets  of  the  same  colour  link  the  endpoints  of  the  frames  enabling  the 
construction  of  the  three  frame  automaton  (Table  8). 

Frame  shifts  and  frame  retrieval 

Table  8  shows  that  the  arrangement  of  the  codons  in  the  frames  according  to  their  projection  on  the 
Cantor  set,  transforms  each  frame  in  such  way  that  if  one  letter  shift  is  performed  the  next  frame  is 
automatically  retrieved  (a-d).  Few  letter  changes  that  occur  during  the  transformations  are 
permissible  and  predicted  according  to  Molecular  Recognition  Theory  (RoS,  QoH,  DoE)  or  N- 
end  rule  i.e.  the  coding  pattern  is  consistent  with  theoretical  and  empirical  observations. 

Unified  concept  of  the  genetic  code 

Presented  results  indicate  that  the  concepts  of  code  without  comma  or  of  evolutionary  code,  based 
on  different  premises,  strongly  depend  on  the  level  of  the  observation/analysis.  In  the  necklace  model 
Crick's  code  without  comma^’^‘’  represents  three  horizontal  frames  that  define  necklace  chains, 
while  Bounce's  evolutionary  code*’^^*  makes  vertically  hanging  beds  (codon  triplets).  Therefore 
circular  coding  necklace  algorithm  represents  an  unifying  concept  of  the  genetic  code.  This  method, 
denoted  SCA,  enables  the  genetic  code  and  protein  analysis  via  number  theory  anthmetic  for  codes. 

Concluding  remarks 
Fibonacci  dynamics  and  Farev  tree 

Two  dimensional  Cantor  set  projection  of  the  binary  (square)  notation  via  Smale  horseshoe  map 
reconstructs  the  classic  table  of  the  genetic  code,  which  proves  our  result  and  opens  the  possibility 
for  the  gene  and  protein  analyses  as  chaotic  dynamical  systems.  Additionally,  the  closest  intersections 
of  Cantor  set  (binary  &  symbolic)  and  Farey  tree  codon  projections  define  "golden  amino  acids" 
(related  to  the  Fibonacci  dynamics*^).  The  Fibonacci  dynamics,  noticed  in  the  algonthms  of  the 


10 


genetic  code^’^“’^®  and  in  long  range  DNA  correlation  exponents,^®  might  arise  from  two  mapping 
frequencies  of  the  code.  The  frequency  of  Cantor  set  projection/recombination  of  the  codons  (2/3)  is 
mixed  with  the  frequency  (1/2)  of  the  Farey  tree  that  splits  the  amino  acid/codon  groups  upon  each 
Cantor  set  projection  level  (2®  =  64).  As  shown  by  Schroeder/  the  frequency  resulting  from  such 
Cantor  set-Farey  tree  interaction  is  the  golden  ratio  0.618  =  2+1/3+2  =  3/5,  which  explains 
previously  mentioned  phenomena.  Some  mathematical  and  dynamical  aspects  of  those  interactions 
have  been  discussed  by  Schroeder  and  §tambuk.®’® 

TG/CT  excess.  TA/CG  deficiency  and  language  decoding 

Binary  coding  quadratic  algorithm  (Figure  1)  based  on  the  pyrimidine-purine  (T-G)  and  strong-week 
H  bonding  (T-C)  discrimination  is  in  accordance  with  the  universal  rule  of  TG/CT  excess  and 
TA/CG  deficiency  in  coding  and  noncoding  DNA  regions^*’^,  since  TA/CG  does  not  satisfy  second 
digit  strong-week  discrimination  (and  consequently  may  be  less  likely  to  appear).  Another  important 
aspect  of  this  study  is  related  to  the  discovery  that  non-coding  DNA  sequences  posses  properties 
characteristic  of  natural  languages,  while  the  coded  DNA  sequences  correspond  to  the  coded 
language  structures.^’^'*  In  this  context  the  concept  presented  in  this  study  may  contribute  to  the 
extraction/decoding  of  the  programming  language  of  DNA  and  RNA  strings. 
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Figure  1.  Binary  notation  of  the  4  nucleotide  bases  based  on  purine-pyrimidine  and  strong-weak  H 
bonding  principles.  Complementary  codon  pairs  and  Siemion's  one  step  amino  acid  mutation  ring 
are  defined  by  means  of  the  hypercube  node  distances  (Table  1)  and  related  permutations  of  Table  3 
with  4  nucleotide  families,  as  shown  in  Table  2.  Dotted  line  =  1st  base  permutation,  solid  line  =  2nd 
base  permutation,  •  =  start;  3rd  base  permutation  involves  CU  and  AG  pairs. 


Table  1.  Binary  and  symbolic  notation  ofRNA,  DNA  and  amino  acids. 
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♦Corresponding  positions  of  the  Wall's  terminating  decimals  in  the  Cantor  set.^'*  Values  denote  positions  1/40, 
3/40,  1/10,  9/40,  1/4,  3/10,  13/40,  27/40,  7/10,  3/4,  31/40,  9/10,  37/40  and  39/40  respectively, 
aa  =  amino  acids;  U  =  T. 


Table  2.  Rules  for  the  permutation  of  classic  genetic  code  (Table  3)  by  the  4  nucleotide  famihes  of 
the  first  two  bases.  The  algorithm  defines:  (a)  positions  of  the  closest  amino  acids  and  (b)  places  of 
all  amino  acid  and  stop  codons  in  Siemion's  one-step  mutation  ring.***  Figure  1  presents  more  details 
on  the  binary  notation  and  paths  of  the  circular  algorithm. 


Table  3.  Classic  table  of  the  genetic  code. 
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Table  4.  Siemion's  one-step  mutation  ring  of  the  genetic  code.  Horizontal  bars  separate  three  periods 
of  the  code  to  A,  C  and  U  family.  Italics  denote  G  family  codons  which  are  distributed  in  different 
periods  (third  base:  Y  =  pyrimidine,  R  =  purine). 
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Table  5.  Horseshoe  map  representations  of  2  binary  triplets  (or  2  octal  numbers,  Table  6)  extract 
classic  genetic  code  pattern  (Table  3)  and  define  codon  mappping  based  on  the  unit  square  (Figure 
1)  transformation. 


♦stop  codons 


♦♦start 


Table  6.  Binary  coding  of  the  genetic  code  and  protein  structure  may  be  transformed  into  octal  coding  system  based  on  3-dimensional  cube 
permutations.  The  octal  coding  system  has  several  advantages,  e.g.  database  compression  or  simple  two-dimensional  map  projection. 
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Table  7.  Circular  coding  patterns  of  the  4  colour  necklace  algorithm.  The  necklace  consists  of  three  horizontal  frames  with  vertically  hanging 
triplets  (made  of  1,  2  or  3  colours).  It  is  shown  that  permutations  of  triplets  in  horizontal  and  in  vertical  direction  make  8  collections  of  3  colours 
12  collections  of  2  colours  and  4  triplets  of  1  colour.  The  exact  location  of  each  triplet  in  frames  given  in  Table  8  is  indicated.  In  all  tables  U 
denotes  U  or  T  situation,  i.e.  the  model  is  valid  for  both  RNA  and  DNA  coding.  Wst  =  opal,  Yst  =  amber  or  ochre  stop  codons. 


aa  Codon  Frame 

<S| 

complements  <=> 

CCA 

CAC 

ACC 

UUG 

UGU 

GUU 

CM 

W 

H 

U 

> 

\  aa  Codon  Frame 

Ml 

. 

fO 

GGU 

GUG 

UGG 

AAC 

ACA  ^ 

CAA 

0 

> 

H 

O 

1 

Codon 

0 

u 

u 

u 

0 

u 

u 

u 

0 

0 

« 

< 

P 

p 

p 

« 

P 

P 

< 

CM 

Pi 

< 

e 

g 

P 

Frame 

M 

u 

a 

g 

o 

u 

fO 

Codon 

« 

u 

0 

0 

0 

u 

0 

« 

0 

0 

u 

< 

P 

< 

1 

aa 

o 

< 

Pi 

j 


Frame  aa  Codon  Frame 


Table  8.  Necklace  model  of  the  genetic  code.  In  horizontal  directions  we  observe  circular  coding  patterns  of  3  necklace  frames  that  make  Crick' 
comma-less  code  while  vertical  directions  define  64  hanging  codons  of  the  evolutional  code  (a-d).  Wsf  =  opal,  Ks/  =  amber  or  ochre  stop  codons. 
Arrangement  of  codons  in  frames  according  to  their  projection  on  the  Cantor  set  transforms  each  fi“ame  in  such  way  that  when  one  letter  shift  is 
performed  the  next  frame  is  automatically  retrieved  (a-d). 
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SUMMARY 

Cathepsin  D,  a  protease  with  the  capability  of  degrading  matrix  proteins,  is  implicated  in  the 
process  of  breast  and  colorectal  cancer  invasion  and  metastasis.  Biochemical  studies  in 
laryngeal  cancer  have  shown  a  potential  prognostic  significance  of  cathepsin  D  content 
determination.  We  studied  immunohistochemical  positivity  of  cathepsin  D  in  tumor  epithelium 
and  stroma  of  61  surgical  specimens  of  squamous  cell  laryngeal  cancer.  Immunohistochemical 
reaction  was  quantitatively  assessed  using  a  PC-based  image  analysis  system  SFORM-VAMS. 
The  results  were  correlated  to  clinical  and  morphological  parameters  and  survival. 
Immunohistochemical  positivity  was  noted  in  neoplastic  cells  and  tumor  stroma.  Significant 
prognostic  value  for  cathepsin  D  was  established  separately  for  epithelial  tumor  component 
and  tumor  stroma  using  log-rank  test,  the  Cox  proportional  hazards  regression  model  and 
C4.5  machine  learning  system.  In  all  groups,  patients  above  the  median  cathepsin  D  staining 
showed  significantly  shorter  survival  time.  C4.5  machine  learning  system  extracted  cut-off 
values  for  the  decision  tree  that  defines  the  probabilities  of  patients  survival  and  death  with 
high  sensitivity  (92.8%  alive,  73.6%  dead),  100%  specificity  and  86.9%  accuracy.  This  makes 
immunohistochemical  cathepsin  D  estimation  an  independent  prognostic  parameter  in 
laryngeal  carcinomas  within  a  5-year  period  firom  the  time  of  tumor  surgery. 


Key  words;  Laryngeal  carcinoma,  Immunohistochemistry,  Cathepsin  D,  prognosis,  Data 
structure.  Machine  learning,  C4.5  classifier 
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INTRODUCTION 

Cathepsin  D  is  a  lysosomal  acidic  protease^  thought  to  be  closely  associated  with  tumor 
invasion  or  metastasis  due  to  its  capability  of  degrading  extracellular  matrix .  In 
histopathological  and  clinical  studies,  overexpression  of  cathepsin  D  was  connected  with 
aggressive  tumor  behaviour  in  different  neoplastic  diseases  ’  ’ .  Immunochemically  the 
presence  of  cathepsin  D  was  demonstrated  in  normal  laryngeal  mucosa  and  in  primary 
laryngeal  squamous  cell  carcinomas  (SCC)**.  Recently,  a  study  using  radioimmunoasay 
correlated  high  cathepsin  D  content  with  a  poor  prognosis,  independent  of  lymph  node 
status’.  Immunohistochemically  cathepsin  D  was  demonstrated  in  neoplastic  and  normal 
laryngeal  mucosal  cells  as  well  as  in  stromal  macrophages*.  Aim  of  our  study  was  to  correlate 
immunohistochemical  expression  of  Cathepsin  D  to  clinical  and  morphological  parameters  as 
well  as  survival  in  a  laryngeal  SCC.  We  compared  the  analysis  of  the  results  by  means  of 
standard  log-rank  test  and  Cox  proportional  hazards  regression  model  to  the  evaluation  of  the 
results  by  means  of  C4.5  machine  learning  system  that  extracts  the  decision  tree  for  the 
classification  of  the  patients  survival. 


MATERIAL  AND  METHODS 


Patients 

We  investigated  61  consecutive  cases  of  previously  untreated  laryngeal  SCC  patients  (males 
and  smokers),  without  detectable  distant  metastases,  randomly  selected  from  our  archives. 
Figure  1  and  2  illustrate  data  concerning  TNM  stages  ®  and  histopathologic  type 
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Immunohistochemistry 

The  tissue  samples  were  collected  from  laryngectomy  specimens,  fixed  in  10%  buffered 
formalin,  routinely  processed  and  embedded  in  paraffin.  3  pm  sections  were  mounted  on 
silanised  slides  and  stained  immunohistochemically  with  an  anti-cathepsin  D  antibody 
(DAKO,  Glostrup)  using  avidin-biotin  method  (ABC,  Vector,  Burlingham)  according  to 
manufacturer’s  specification  (Picture  1,  cathepsin  D  =  dark  brown  staining).  All  the  slides  were 
stained  in  one  batch,  by  one  technical  assistant. 

Image  analysis 

Immunoreactivity  was  analysed  by  two  of  the  authors  (SS  and  AY)  on  a  Leitz  Diaplan 
microscope,  using  a  PC  based  image  analysis  system  SFORM-VAMS  (Zagreb,  Croatia; 
http;//www.vams.com)^”  and  a  CCD  camera  (JVC  TK  1270).  Background  lightening  was  kept 
constant  and  uniform,  and  a  standard  blue  filter  was  used.  The  immunoreactive  area  was 
accessed  as  percentage  of  total  area  analysed  (4  fields,  objective  ><25).  Immunoreactivity  was 
analysed  separately  for  epithelial  and  stromal  cells. 

Data  analysis 

Analysed  variables  were;  cathepsin  D  immunoreactivity  (separately  for  tumor  stroma  and 
tumor  epithelium),  histopathologic  grade  (grade  I  and  11  together),  chnical  TNM  stage  and 
survival.  Analysis  was  made  using  Kaplan-Meier  curves  and  log-rank  test”,  the  Cox 
proportional  hazards  regression  model'^  and  C4.5  decision  tree  learning  algorithm”'^*. 

C4.5  machine  learning  program 

The  program  C4.5  is  a  successor  of  the  basic  IDS  deccision  tree  learning  algorithm”’”’^*.  C4.5 
generates  the  classifier  in  the  form  of  decision  tree  with  elements  being  either  leafs  or  decision 
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nodes^^'^*.  The  leaf  shows  a  class  and  the  decision  node  specifies  the  test  to  be  implemented  on 
an  attribute  value,  with  one  branch  and  subtree  for  each  possible  result  of  the  test  .  The 
starting  node  is  the  root  node  and  a  tree  is  used  to  predict  a  case  by  starting  at  the  root  and 
moving  through  the  tree  until  the  leaf  is  encountered*^’*'’'"’**.  The  C4.5  algorithm  presumes 
existence  of  appropriate  number  of  learning  examples  described  by  set  of  attributes  and  by 
classes  representing  conditions*^’"’**.  It  searches  for  the  most  informative  attribute  according 
to  the  gain  criterion  and  constructs  decision  tree**’*'**.  This  search  is  based  on  Shannons 
measure  of  information**’*'’*".  Pruning  is  used  to  reduce  the  decision  tree,  i.e.  for  producing 
more  comprehensible  structure  without  compromising  accuracy  on  unseen  cases  **’  *'’*"’**.  For 
any  tree,  all  paths  lead  to  a  leaf  corresponding  to  a  decision  rule  that  is  a  logical  conjunction  of 
various  tests*'’  **.  If  there  are  multiple  paths  for  a  given  class,  then  the  paths  represent  logical 
disjunctions**’*'’*"’**.  All  paths  are  mutually  exclusive  For  any  new  case,  one  and  only 

one  path  in  the  tree  will  always  have  to  be  satisfied**’*'’**. 

Sensitivity,  specificity  and  accuracy  of  the  procedures  were  obtained  in  a  standard  way*^. 
Predictivity,  i.e.  reliability  of  the  classifier  predictions  was  calculated  as  a  ratio  of  the  number 
of  true  predictions  to  the  size  of  appropriate  prediction  class  {alive  or  . 

RESULTS 

Histopathological  analysis 

Follow  up  of  the  patients  was  from  4  to  108  months  with  a  median  of  60  months.  42  patients 
were  censored  (group  alive)  and  19  completely  observed  (group  deadf^.  Cathepsin  D 
immunoreactivity  was  histologically  observed  in  normal  and  neoplastic  tissue.  In  normal 
laryngeal  mucosa  next  to  tumor,  difEuse,  weak  cytoplasmatic  positivity  was  noted.  Scarce 
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reactivity  was  also  present  in  stromal  macrophages.  Neoplastic  epithelial  cells  showed  mostly 
diflhise  positivity  ranging  from  occasional  to  majority  of  cells,  with  a  shghtly  stronger 
expression  in  more  dedifferentiated  cells.  In  the  tumor  stroma  abundant  immunoreactive  cells 
(macrophages)  were  noted.  Scarce  apical  reaction  in  the  stroma  of  the  salivaiy  gland  cells  was 
also  noted. 

Kaplan-Meier  curves  and  log-rank  test 

In  the  tumor  epithelium  cathepsin  reactivity  ranged  from  0.06%  to  5.63%  with  a  median  of 
1.35%  while  in  the  stroma  its  range  was  from  0.62%  to  42.02%  with  a  median  of  9.86% 
(Figure  3,  4.).  We  compared  cathepsin  D  expression  with  conventional  prognostic  factors. 
There  was  no  significant  correlation  of  cathepsin  D  immunoreactivity  with  chnical  TNM 
stages  (Spearman  Rank  Order  Correlation  =  0.24  and  p  <  0.05)  as  well  as  with 
histopathological  grading  (Spearman  Rank  Order  correlation  =  0.08  and  and  p  <  0.05). 
Immunoreactivity  for  cathepsin  D,  in  both  tumor  epithelial  cells  and  stroma  respectively, 
showed  strong  influence  on  patient  survival  (Figures  5,  6).  Clinical  TNM  status  (Chi-square  — 
3.896,  df  =  3,  p  =  0.273)  and  a  histological  grade  (Chi-square  =  3.739,  df  =  2,  p  =  0.154) 
showed  no  significant  influence  on  patient  survival. 

The  Cox  proportional  hazards  regression  model 

The  Cox  proportional  hazards  regression  model  suggested  that  only  cathepsin  D 
immunoreactivity  in  epithelial  cells  have  statistically  significant  effect  (p  <  0.05,  Table  1). 

C4.5  machine  learning  system 

For  the  analysis  with  the  C4.5  classifier  the  patients  were  divided  in  two  groups  {dead  or  alive 
after  60  months,  i.e.  5  years  following  the  surgical  procedure).  A  decision  tree  extracted 
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cathepsin  D  epithelial  and  stromal  tumor  cell  staining  as  the  most  significant  for  the 
classification  of  the  alive  or  dead  groups  of  patients,  i.e.  the  prognosis.  The  classification  rules 
depicted  in  the  Figure  7.  can  be  read  as  follows: 

1 .  The  first  rule  says  that  if  the  value  of  RED  (epithelial  cathepsin  D)  is  less  then  or  equal 
to  2.33%  then  a  patient  belongs  to  a  group  alive. 

2.  If  that  rule  is  not  satisfied  (i.e.  RED  -  epithelial  cathepsin  D  is  more  than  2.33%)  then 
the  group  is  alive  when  BLUE  -  stromal  cathepsin  D  is  above  38%. 

3.  The  group  is  when  RED  -  epithelial  cathepsin  D  is  more  than  2.33%  and  BLUE  - 
stromal  cathepsin  D  equal  or  less  then  38%. 

The  test  showed  high  sensitivity  by  accurately  predicting  5-year  survival  following  the  surgical 
procedure  in  92.8%  of  the  patients  with  laryngeal  squamous  cell  carcinoma  (Figure  7).  The 
classifier's  sensitivity  in  predicting  the  death  due  to  the  tumor  progression,  within  5-year  period 
following  the  surgery,  was  satisfactory  73.6%  (Figure  7).  The  specificity  of  the  test  was  100% 
since  the  decision  tree  evaluation  is  made  on  the  tumor  tissue,  absent  in  the  normal  laryngeal 
immunohistochemical  sample.  Therefore  all  patients  without  tumor  have  a  priori  negative  test 
result  with  respect  to  specificity  evaluation The  accuracy  of  the  test  was  also  high  (86.9%) 
and  the  reliability  of  classifier's  prediction  (predictivity)  was  88.6%  for  the  group  alive  and 
82.3%  for  the  group  dead  (Figure  7). 

DISCUSSION 

Expression  of  cathepsin  D  was  analysed  in  different  malignancies  such  as  breast’  ’  , 
melanoma  colorectaP,  head  and  neck  cancer^'*  as  well  as  in  childhood  or  nervous  system 
neoplasm  It  was  suggested  that  cathepsin  D  can  play  a  role  in  tumor  cell  proliferation  by 
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growth  factor  activation  or  promote  tumor  invasion  and  metastasis  by  activating  proteolytic 
enzymes 

In  laryngeal  carcinomas  different  studies  established  by  immunometric  assays  a  higher 
cathepsin  D  content  in  tumor  tissue  samples,  as  compared  with  normal  laryngeal  mucosa®’^. 
Immunohistochemically  strong  reactivity  was  demonstrated  for  cathepsin  D  in  tumor  cells  and 
in  tumor  stroma  macrophages  that  infiltrate  the  tissue.  In  this  context  very  high  stromal 
cathepsin  D  value  linked  to  the  patients  survival  in  subgroup  of  patients  (Figure  7),  may  be  due 
to  the  enhanced  local  immune  response  to  tumor  antigens^^. 

C4.5  decision  tree  learning  algorithm  (Figure  7)  was  superior  to  Cox’s  model  (Table  1), 
regarding  the  analysis  of  data  structure,  since  it  extracted  cut-off  values  of  both  epithelial  and 
stromal  cathepsin  D  content  relevant  for  the  survival.  Similarly  to  C4.5,  log-rank  test  showed 
the  statistical  significance  of  epithelial  and  stromal  cathepsin  D  content  for  patients  survival 
(Figures  5,  6),  but  failed  to  explain  good  prognosis  for  the  subgroup  of  the  patients  with 
extremely  high  stromal  cathepsin  D  content. 

Stromal  cathepsin  D  values  of  <  38%,  linked  to  the  tumor  progression  (Figure  7)  probably 
reflect  enhanced  activity  of  protease  concerning  the  metastasis'’^  and  less  pronounced  immune 
response.  It  is  worth  mentioning  that,  with  respect  to  C4.5  based  classification,  the  cut-off 
prognostic  value  of  stromal  cathepsin  D  is  the  descendant  node  of  the  best  predicting  attribite 
(i.e.  epithehal  tumor  cathepsin  D)  which  represents  the  most  informative  and  root  node  of  the 
tree'^’'*.  Best  attribute  defining  of  tumor  epithelial  cathepsin  D  by  means  of  C4.5  classifier  is  in 
agreement  with  the  fact  that  laryngeal  sqamous  cell  carcinoma  is  an  epithelial  neoplasm. 
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Prognostic  significance  was  recently  assumed  for  radiometrically  measured  cathepsin  D 
levels®’^''’^*,  however,  to  our  knowledge  this  is  the  first  immunohistochemical  study 
demonstrating  strong  prognostic  significance  of  cathepsin  D  in  laryngeal  cancer.  Besides  being 
highly  prognostic,  this  type  of  immunochemical  test  is  relatively  cheap  and  easy  to  perform 
which  makes  the  combination  of  quantitative  immunohistochemical  analysis  of  cathepsin  D 
and  C4.5  based  data  classification  a  potent  prognostic  tool  in  laryngeal  cancer  patients. 

Although  the  surgical  treatment  is  condicio  sine  qua  non  in  the  therapy  of  laryngeal  squamous 
cells  carcinoma  it  seems  that  cathepsin  D  content  in  histopathology  samples  of  the  tumor  cells 
represents  an  important  predictive  factor  for  tumor  recidives  and  aggressive  behaviour.  In  this 
study  the  data  analysis  was  not  influenced  by  other  therapeutic  procedures  (e.g.  chemotherapy 
or  radiotherapy)  due  to  the  fact  that  surgery  is  a  primary  therapeutic  procedure  for  the  type 
and  stage  of  laryngeal  neoplasm  we  observed  (Figure  1).  It  is  worth  mentioning  that  all  of  our 
patients  were  males  and  smokers. 

In  our  study  expression  and  localisation  of  cathepsin  D  immunoreactivity  correlated  with  data 
obtained  by  others  for  laryngeal  neoplasms  .  From  our  results,  cathepsin  D  seems  to  be  an 
independent  prognostic  marker  in  primary  laryngeal  carcinomas,  which  confirms  the 
hypothesis  of  Marsigliante  et  al^.  Decision  tree  extracted  by  means  of  C4.5  classifier  was 
shown  to  be  a  valuable  tool  to  define  highly  sensitive,  specific,  accurate  and  predictive  cut-off 
values  for  immunohistochemical  cathepsin  D  data.  It  remains  an  open  question  if  this  method 
of  analysis  could  be  applied  to  other  tumors  (e.g.  breast, colorectal  and  gactric  cancer) 
with  established  link  between  aggressive  neoplastic  behaviour  and  cathepsin  D  content. 
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Fig  4.  Distribution  of  patients  by  cathepsin  D 
positive  area  ( %  )  in  tumor  stroma 


Fig  5.  Survival  analysis  for  cathepsin  D  positive  area  ( %  ) 
in  the  epithelial  component  of  tumor  ( Kaplan  -  Meier  ) 

°  Complete  response  +  Censored  observations 
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Fig  6.  Survival  analysis  for  cathepsin  D  positive  area  ( % ) 
in  tumor  stroma  ( Kaplan  -  Meier ) 

°  Complete  response  +  Censored  observations 


Fig  7.  Decision  tree  obtained  by  C4.5  machine  learning  system 


SENSITIVITY  (alive)  =  92.8  %  =  3  9/42 
SENSITIVITY  (dead)  =  73.6  %  =  14/19 
PREDICTIVITY  (alive)  =  88.6  %  =  39/44 
PREDICTIVITY  (dead)  =  82.3  %  =14/17 
SPECIFICITY  =  100  % 

ACCURACY  =  86.9% 


Table  1.  Cox  proportinal  hazard  risk  model  for  clinical  stage,  cathepsin  D  content  in  epithelial 
and  stromal  component  of  tumor  and  histopathological  gradus 


Cox  model 

Chi- 

■square  =  18.3673 

df=4 

p  =  0.0015 

Variables 

Estimate 

Standard  error 

t-value 

two-sided  p  value 

Clinical  stage 

0.43 

0.28 

1.51 

0.13 

Cathepsin  D  (epithel) 

0.83 

0.32 

2.30 

0.009 

Cathepsin  D  (stroma) 

-0.04 

0.05 

0.96 

0.46 

Histopathological  grade 

0.16 

0.29 

0.54 

0.59 
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SUMMARY 


An  imbalance  between  urinary  promoting  and  inhibiting  factors  has  been  suggested  as  more  important 
in  urinary  stone  formation  than  a  disturbance  of  any  single  substance.  To  investigate  the  value  of 
promoter/inhibitor  ratios  for  estimation  of  the  risk  of  urolitluasis,  urinary  citrate/calcium, 
magnesium/calcium  oxalate  and  oxalate/citratexglycosaminoglycans  ratios  were  determined  in  30 
children  with  urolithiasis,  36  children  with  isolated  hematuria,  and  15  healthy  control  children.  The 
cut-off  points  between  normal  children  and  children  with  urolithiasis,  accuracy,  specificity  and 
sensitivity  for  each  ratio  were  determined  and  compared  with  those  of  the  24h-urine  calcium  and 
oxalate  excretion  and  urine  saturation  calculated  with  the  computer  program  EQU1L2.  The  neural 
network  application  (  aiNET  Artificial  Neural  Network,  version  1.25)  was  used  for  the  determination 
of  the  cut-off  points  for  the  classification  of  a  normal  children  and  urolithiasis  group.  The  best  test 
for  differentiating  stone  formers  fi-om  non-stone  formers  proved  the  aiNET  determined  cut-off  values 
of  oxalate/citratexglycosaminoglycans  ratio.  The  method  showed  97.78  %  accuracy,  100%  sensitivity 
and  93.33  %  specificity.  Two  cut-off  points  between  normal  and  urolithiasis  groups  were  found 
showing  that  the  children  with  urolithiasis  had  the  ratio  values  either  above  34.00  or  less  than  10. 16. 
Increased  oxalate  excretion  was  linked  to  the  first  cut-off  value  (34.00)  and  decreased 
glycosaminoglycans  excretion  was  typical  of  the  second  cut-off  value  (10. 16). 

Key  words:  urinary  stones,  promoter/inhibitor  ratios,  risk  of  urolithiasis,  neuronal  network. 
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INTRODUCTION 


Extensive  examination  of  number  of  urinary  promoting  and  inhibiting  factors  was  undertaken  over 
years  to  investigate  risk  for  stone  formation.  It  has  been  shown  that  no  single  promoter  or  inhibitor 
can  discriminate  clearly  enough  any  particular  individual  as  healthy  or  sick.  Combination  of  factors 
seems  to  provide  better  separation  of  stone  formers  from  normal  subjects.  Several  ratios  between 
promoting  and  inhibiting  factors,  such  as  calcium  x  oxalate/creatinine  x  magnesium',  calcium/citrate^, 
magnesium/calciumxoxalate^  oxalate/calcium'*,  citrate/calcium  ^  oxalate/citratexglycosaminglycans*, 
as  well  as  more  sophisticated  methods  that  take  into  account  the  number  of  urinary  compnents^''"', 
were  used  to  detect  the  imbalance  between  the  promoting  and  inhibiting  factors  leading  to  stone 
formation.  We  examined  1 1  single  urinary  factors  potentially  promoting  or  inhibiting  crystallization 
and  urine  saturation  with  computer  program  EQUIL  2  in  children  with  isolated  hematuria  and  overt 
urolithiasis  and  compared  the  findings  with  the  findings  of  normal  healthy  children'^  In  our  previous 
report  urine  saturation  was  found  as  the  best  parameter  for  the  estimation  of  the  relative  risk  of 
urolithiasis.  However,  logistic  regression  failed  correctly  to  classify  14.59%  of  the  group  members'^ 
The  aim  of  the  present  study  was  to  evaluate  the  value  of  promoter/inhibitor  ratios  for 
estimation  of  the  risk  of  urolithiasis.  Those  tests  are  simpler,  easier  and  cheaper  for  routine  clinical 
practice  than  EQUIL2.  Citrate/calcium,  magnesium/calciumx oxalate  and  oxalate/citratex 
glycosaminoglycans  ratios  were  chosen  for  this  purpose  because  all  of  them  take  into  account  the 
major  urinary  stone  promoting  and  inhibiting  factors.  Neuronal  networks  analysis  by  means  of  aiNET 
Artificial  Neural  Network  (version  1.25)  was  used  to  determine  the  cut-off  points  between  normal 
and  urolithiasis  groups.  Accuracy,  specificity  and  sensitivity  were  calculated  for  each  of  these  ratios 
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and  compared  with  those  of  the  24-h  urine  calcium  and  oxalate  excretion  and  urine  saturation. 

PATIENTS  AND  METHODS 

Patients.  Thirty  children  with  urolithiasis  and  36  children  with  isolated  hematuria  were 
investigated.  A  group  of  15  healthy  sex  and  age  matched  children  without  any  nephrourological 
disease  or  pathological  condition  that  might  influence  urine  composition  served  as  controls. 

In  children  with  hematuria,  glomerular  diseases,  urinary  infection,  urological  anomalies  and 
coagulopathy  were  excluded  before  entering  the  study.  If  a  checkup  of  serum  and  urine  electrolytes 
revealed  hypercalciuria,  the  known  causes  of  hypercalciuria  (renal  tubular  acidosis,  hypercalcemic 
conditions)  were  excluded. 

In  children  with  urolithiasis  ultrasonography  and/or  urography  established  the  diagnosis. 
Cystinuria  and  hyperuricosuria  were  excluded. 

The  children  were  enrolled  in  the  study  with  informed  parental  consent. 

Urine  sampling  and  analysis.  From  each  child,  24-h  urine  collections  performed  on  two  consecutive 
days  and  one  urine  sample  collected  from  8  to  10  a.m.  on  the  third  day  were  obtained  for  analysis. 
The  24-h  urine  of  the  first  day  served  for  measuring  creatinine,  calcium,  sodium,  potassium,  oxalate, 
phosphate,  magnesium,  citrate  and  sulphate.  It  was  collected  in  a  wide-mouthed  plastic  bottle 
containing  10  ml  6N  hydrochloric  acid  as  preservative.  The  24-h  urine  of  the  second  day  was 
collected  in  the  same  way  but  without  the  addition  of  hydrochloric  acid  in  the  bottle.  It  served  for 
measuring  chloride,  urate,  GAGs  and  creatinine.  The  2-hour  urine  collected  on  the  third  day  served 
for  ammonium  and  creatinine  measuring.  In  this  sample  500  mg  di-potassium-oxalate  was  added 
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immediately  after  voiding  to  prevent  ammonium  decomposition. 

pH  of  urine  was  measured  with  indicator  sticks  (Boehringer  Mannheim,  Germany).  Oxalate, 
citrate  and  sulphate  were  measured  using  a  Dionex  Series  4000i  gradient  ion  chromatography  system 
(Dionex  Co,  Sunnyvale,  C A,  USA)^®.  GAGs  were  measured  by  carbazole  method  ammonium  by 
glutamate  dehydrogenase  (Da  Fonseca-Wollheim)**  and  magnesium  by  atomic  absorption 
spectrophotometry*®.  The  following  analyses  were  done  on  Olympus  AU  800  Analyser:  creatinine  by 
standard  kinetic  Jaffe  procedure^”,  sodium,  potassium  and  chloride  by  ion  selective  electrodes, 
calcium  by  the  cresolphthalein-complexon  method^*,  phosphate  by  molybdate  method^  and  uric  acid 
by  uricase  method^^. 

From  the  values  of  urinary  24-h  volume,  pH  of  urine,  calcium,  sodium,  potassium,  chloride, 
magnesium,  phosphate,  sulphate,  ammonium,  urate,  oxalate,  citrate  and  creatinine  (mmol/L),  the 
urinary  calcium  oxalate  saturation  was  calculated  by  the  computer  program  EQUIL  Also,  24-h 
urinary  excretion  expressed  as  a  ratio  to  the  creatinine  was  calculated  for  each  of  the  measured 
urinary  components. 

Data  analysis.  Data  were  presented  as  medians  with  minimum  and  maximum  values.  Cut-off 
values  between  normal  children  and  children  with  urolithiasis  were  determined  using  a  neuronal 
network  application  (aiNET  Artificial  Neural  Network  Version  1.24,  Celje,  Slovenia) 

Artificial  neural  network  aiNET  is  based  on  a  self-organising  system,  called  neural  network-like 
system,  and  it  is  very  similar  to  the  Kohonen's  self  organisation  process^^  The  algorithms  used  by 
aiNET  do  not  require  any  learning  phase  and  the  answers  about  prediction  are  obtained  almost 
immediately^'*’ When  the  data  is  chaotic  and  there  is  no  possible  solution  aiNET  will  suggest  a  data 
problem^^’ 
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Accuracy,  specificity  and  sensitivity,  as  well  as  95%  confidence  interval  for  determined  cut-off  were 
calculated. 

RESULTS 


Table  1.  shows  median  (minimum  -  maximum)  values  of  the  24-h  urinary  excretion  of  calcium 
and  oxalate,  urine  saturation,  citrate/calcium,  magnesium/calciumx  oxalate  and  oxalate/citratex 
gycosaminoglycans  ratios.  Cut-off  points  between  normal  children  and  children  with  urolithiasis  were 
possible  to  determine  using  all  variables  except  a  magnesium/calciumx  oxalate  ratio,  which  data  were 
too  dispersed  for  such  discrimination  (Table  2).  Children  with  urolithiasis  had  urine  saturation, 
calcium/creatinine,  oxalate/creatinine  and  citrate/calcium  above  4.70,  0.20,  0.48  and  1.38 
respectively.  For  oxalate/citratexglycosaminoglycans  ratio  two  cut-off  points  were  found.  Children 
with  urolithiasis  had  the  ratio  values  either  above  34.80  or  less  than  10.16.  The  most  accurate  method 
for  discrimination  normal  from  sick  children  was  oxalate/citratexglycosaminoglycans  ratio  that 
showed  as  100%  sensitive  and  highly  specific  with  only  6.67%  false  positive  results.  Than  follow 
citrate/calcium  ratio  and  urine  saturation,  the  former  with  better  sensitivity  and  the  later  with  the 
better  specificity.  All  children  with  the  urolithiasis  had  at  least  3  of  5  examined  variables  in  the  range 
of  the  pathological  values  and  in  19  out  of  30  (63.3%)  children  all  variables  showed  pathological 
values  (Table  3).  On  contrary  all  normal  children  except  1,  had  no  more  than  2  variables  in  the  range 
of  the  pathological  values.  In  children  with  hematuria  results  were  dispersed,  although  the  tendency 
of  having  higher  number  of  pathological  variables  was  noticed. 
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DISCUSSION 

It  seems  reasonable  to  consider  urolithiasis  as  multifactorial  disorder  with  risk  of  stone 
formation  dependant  upon  a  disturbance  in  the  balance  of  promoting  and  inhibiting  factors.  In  our 
previous  study  we  have  shown  that  urine  saturation  estimates  the  relative  risk  of  urolithiasis  better 
than  any  single  urinaiy  constituent‘^  However,  the  determination  of  urine  saturation  may  be 
inconvenient  for  routine  clinical  practice  being  time  consuming  and  expensive.  Therefore,  we  tried 
to  find  an  easier  parameter  with  high  sensitivity  to  discrinfinate  stone  formers  from  healthy  children. 
Saturation  can  also  be  expressed  in  terms  of  ratios  between  urine  concentration  of  2  or  3  substances 
involved  in  lithogenesis.  Among  the  ratios  examined  in  this  study  the  best  proved  oxalate/citratex 
glycosaminoglycans.  Baggio  et  al.  first  suggested  this  ratio  as  simple  method  for  detection  of  the 
imbalance  between  promoting  and  inhibiting  factors  and  found  abnormally  high  ratio  values  in 
children  with  idiopathic  urolithiasis®.  The  ratio  can  differentiate  more  than  80%  of  stone  formers  from 
non-stone  formers.  In  the  present  study  not  only  veiy  high,  but  also  very  low  values  of  the  ratio  were 
found  in  children  with  urolithiasis  in  comparison  with  normal  children.  The  ratio  values  above  upper 
limit  of  normal  belonged  to  patients  with  increased  oxalate  excretion,  while  the  ratio  values  under 
lower  limit  of  normal  reflected  relatively  decreased  glycosaminoglycans  excretion.  In  our  previous 
study^^  standard  statistical  methods  (two-way  analysis  of  variance  and  Tukey  HSD  test  with 
correction  for  unequal  N)  could  not  detect  influence  of  glycosaminoglycans  on  differentiation 
between  normal  children  and  urolithiasis  group.  The  result  of  neural  network  classification  (Table  2) 
based  on  the  artificial  intelligence  method  of  data  analysis  implies  that  decreased  glycosaminoglycans 
values  may  influence  the  stone  formation  in  a  sub-population  of  stone-formers.  The  use  of  two  cut- 
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off  points  made  it  possible  to  increase  accuracy  of  the  oxalate/ citratexglycosaminoglycans  ratio  in 
differentiation  between  stone  formers  and  normal  children  from  80%  to  97.78%.  Only  one  healthy 
child  had  pathologically  high  ratio  value  due  to  unexpectedly  high  glycosaminoglycans  excretion,  the 
finding  that  is  difficult  to  interpret. 

The  citrate/calcium  ratio  has  proved  as  very  good  discriminator  between  stone  formers  from 
normal  children,  too.  Although  of  somewhat  lesser  accuracy  and  specificity  than  oxalate/citratex 
glycosaminoglycans  ratio  (Table  2),  a  citrate/calcium  ratio  has  advantage  of  easy  performing  in 
clinical  practice. 

The  study  showed  once  more  that  the  disturbance  of  more  than  one  of  substances  involved 
in  lithogenesis  must  be  present  for  stone  formation.  All  children  with  urolithiasis  had  at  least  3 
pathological  parameters,  while  all  but  1  normal  child  has  no  more  than  2  pathological  parameters. 

The  neural  network  analysis  of  the  laboratory  tests  related  to  the  important  and  common 
medical  problem  of  urolithiasis  proved  to  be  of  potential  clinical  value. 


9 


LITERATURE 

1 .  Tiselius,  H.G.  Relationship  between  the  severity  of  renal  stone  disease  and  urine  composition. 
Eur. Urol.  1919,  \5,S. 

2.  Butz,  M.;  Schulte,  P.;  Knispel,  H.  Neue  Gesichtspunkte  zur  Interpretation  harnchemische 
Befunde  beim  Hamsteinleiden.  Verh.Ber.34,  Tagg  Dtsch  Ges  Urologie,  Hamburg  1982. 
Springer  Verlag:  Berlin,  Heidelberg,  New  York,  Tokyo,  1983,  p  331. 

3 .  Robertson,  W.G.  Physico-chemical  aspects  of  calcium  stone-formation  in  the  urinary  tract. 
In:  Urolithiasis  Research,  Fleisch,  H.;  Robertson,  W.G.;Smith,  L.H.;  Vahlensieck,  W.,  Ed.; 
Plenum  Press:  New  York,  1976;  p  25. 

4.  Robertson,  W.G;  Rutherford,  A.  Aspects  of  the  analysis  of  oxalate  in  urine  -  a  review.  Scand 
J  Urol  Nephrol,  19SI,  (Suppl.)  53:85. 

5.  Perrone,  H.C.;  Toporovski,  J.;  Shor,  N.  Urinary  inhibitors  of  crystallization  in  hypercalciuric 
children  with  hematuria  and  nephrolithiasis.  Pediatr  Nephrol,  1996,  10,  435-437. 

6.  Baggio,  B.;  Gambaro,  G.;  Favaro,  S.;  Borsatti,  A.;  Pavanello,  L.;  Siviero,  B.;  Zacchello,  G.; 
Rizzoni,  G.F.  Juvenile  renal  stone  disease:  a  study  of  urinary  promoting  and  inhibiting  factors. 
JUrol,  1983,  130,  1133-1135. 

7.  Marshall,  R.W.;  Robertson,  W.G.  Nomograms  for  the  estimation  of  the  saturation  of  urine 
with  calcium  oxalate,  calcium  pyrophosphate,  magnesium  amonium  phosphate,  uric  acid, 
sodium  acid  urate,  ammonium  acid  urate  and  cystine.  Clin.  Chim.  Acta,  1976,  72,  253-260. 

8.  Robertson,  W.G;  Peacock,  M;  Marshall,  R.W.;  Marshall,  D.H.;  Nordin,  B.E.C.  Saturation- 
inhibition  index  as  a  measure  of  thwe  risk  of  calcium  oxalate  stone  formation  in  the  urinary 


10 


tract.  N.Engl.  J.  Med.  1976,  294,  249-252. 

9.  Finlayson,  B.  Calcium  stones:  some  physical  and  clinical  aspects.  In:  Calcium  Metabolism  in 
Renal  Failure  and  Nephrolithiasis,  David  D.S.,  Ed.;  Wiley:  New  York,  1977;  Chapter  10, 
p  337-382. 

10.  Robertson,  W.G.;  Peacock,  M.;  Heyburn,  P.J.;  Marshall,  D.H.;  Clark,  P.B.  Risk  factors  in 
calcium  stone  disease  of  the  urinary  tract.  Br.  J.  Urol.  1978,  50,  449  -  452. 

11.  Pak,  C.Y.C.;  Galosy,  R.A.  Propensity  for  spontaneous  nucleation  of  calcium  oxalate. 
Quantitative  assessment  by  urinary  FPR-APR  discriminant  score.  Am.  J.  Med.  1980,  69,  681- 
689. 

12.  Pak,  C.  Y.C.;  Skurla,  C.;  Harvey,  J.  Graphic  display  of  urinary  risk  factors  for  renal  stone 
formation.  J.  Urol.  1985,  134,  867-870. 

13.  Tiselius,  H.G.  Measurement  of  the  risk  of  calcium  oxalate  crystallization  in  urine.  Urol.  res. 
1985,  13,  297-300. 

14.  Werness,  P.;  Brown,  C.M.;  Smith,  L.H.;  Finlayson,  B.  EQUIL  H:  a  BASIC  computer 
program  for  the  calculation  of  urinary  saturation. Urol.  1985,  134,  1242-1244. 

15.  Milosevic,  D.;  Batinic,  D.;  Blau,  N.;  Konjevoda,  P.;  Stambuk,  N.;  Votava-Raic,  A.;  Barbaric, 
V.;  Fumic,  K.;  Rumenjak,  V.;  Stavljenic-Rukavina,  A.;  Nizic,  Lj.;  Vrljicak,  K.  Determination 
of  urine  saturation  with  computer  programe  EQUIL  2  as  a  method  for  estimation  of  the  risk 
of  urolithiasis.  J  Chem  Inf  Comput  Sci,  1998,  38,  646-650. 

16.  Classen,  A.;  Hesse,  A.  Measurement  of  urinary  oxalate:  an  enzymatic  and  ion 
chromatographic  method  compared.  J.  Clin.  Chem.  Clin.  Biochem.  1987,  25,  95  -99. 

17.  Bitter,  T.;  Muir,  H.M.  A  modified  uronic  acid  carbazole  reaction.  Anal  Biochem,  1962,  4, 


11 


330-334. 

18.  Da  Fonseca-Wollheim,  F.  Z.  Bedeutung  von  WasserstoflBonenkonzentration  und  ADP-Zusatz 
bei  der  Ammoniakbestimmung  mit  Glutamatdehydrogenase.  Klin.  Chem.  Klin.  Biochem. 
1973,11,421-425. 

19.  Hansen,  J.L.;  Frier,  E.F.  The  measurement  of  serum  magnesium  by  atomic  absorption 
spectrophotometry,  ylw.  J.Med.  Techol.  1967,  33,  158-166. 

20.  Bonses,  R.W.;  Taussky,  H.H.  On  the  colorimetric  determination  of  creatinine  by  the  Jaffe 
reaction.  J.  Biol.  Chem.  1951,  158,  581-591. 

21.  Connerty,  H.V.;  Briggs,  A.R.  Determination  of  serum  calcium  by  means  of 
orthocresolphthalein  complexone.  Am.  J.  Clin.  Pathol.  1966,  45,  290-296. 

22.  Drewes,  P.A.  Direct  cplorimetric  determination  of  phosphorus  in  serum  and  urine.  Clin. 
Chim.  Acta,  1972,  39,  81-88. 

23.  Liddle,  L.;  Seegmiller,  J.E.;  Faster,  L.  Enzymatic  spectrophotometric  method  for 
determination  of  uric  acid.  J.  Lab.  Clin.  Med.  1959,  54,  903-913. 

24.  aiNET  Artificial  Neural  Network  Version  1 .25;  Users Manual,Tmhdi^twz  42,  SI-3000  Celje, 
Slovenia,  Europe,  1995  (e-mail:  ainet@siol.net). 

25.  Grabec,  I.;  Self-Organisation  of  Neurons  Described  by  the  Maximum-Enthropy  Principle. 
Biol  Cybern.  1990,  63,  403-409. 


12 


Table  1.  Urinary  Promoters  and  Inhibitors  of  Crystallization  “ 


normal  children 

hematuria 

urolithiasis 

parameters 

md 

min 

max 

md 

min 

max 

md 

min 

max 

urine  saturation 

1.91 

0.76 

3.48 

4.32 

0.74 

16.29 

19.59 

0.72 

25.20 

calcium/creatinine  (mmol/mmol) 

0.17 

0.08 

0.33 

0.22 

0.11 

0.79 

0.30 

0.08 

0.56 

oxalate/creatinine  (mmol/mol) 

46.00 

19.00 

76.00 

51.00 

8.00 

111.00 

75.00 

20.00 

111.00 

citrat/calcium  ratio  (mmol/mmol) 

1.81 

0.48 

3.59 

1.33 

0.31 

7.74 

0.71 

0.17 

1.38 

magnesium/calcium  x  oxalate  ratio 

(mmol) 

1.05 

0.13 

3.31 

0.80 

0.09 

4.15 

0.92 

0.06 

3.44 

oxalate/citrate  x  glycosaminoglycans 

ratio  (  mmol  x  10'*) 

19.09 

7.64 

34.80 

43.17 

7.42 

257.85 

195.49 

7.72 

631.77 

Md,  median;  min,  minimum;  max,  maximum. 
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Table  2.  Validity  indexes  for  Examined  Promoters  and  Inhibitors  of  Urolithiasis 


parameters 

Cut-oflf  value 

Accuracy 

Specificity 

Sensitivity 

Normal 

children 

Urolithiasis 

urine  saturation 

<4.70 

a  4.70 

88.89% 

93.33% 

86.67% 

(75.15-95.84%) 

(71.27-99.67%) 

(70.90-95.62%) 

calcium/creatinine 

<0.20 

a  0.20 

82.22% 

73.33% 

86.67% 

(mmol/mmol) 

(67.42-91.49%) 

(47.47-90.90%) 

(70.90-95.62%) 

oxalate/creatinine 

<  48.00 

a  48.00 

75.56% 

53.33% 

86.67% 

(mmol/mol) 

(60.14-86-61%) 

(26.68-76.80%) 

(70.90-95.62%) 

citrat/calcium  ratio 

>1.38 

s:  1.38 

91.11% 

73.33% 

100% 

(mmol/mmol) 

(77.87-97.11%) 

(47.47-90.90%) 

(90.05-100%) 

magnesium/calcium  x  oxalate  ratio 

a 

a 

a 

a 

a 

(  mmol) 

oxalate/citrate  x  glycosaminoglycans 

10.16-34.80 

<  10.16  or 

97.78% 

93.33% 

100% 

ratio  ( mmol  x  10") 

>34.80 

(86.77-99.88%) 

(71.27-99.67%) 

(90.05-100%) 

a  -  not  possible  to  determine 
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Table  3.  Number  of  positive  risk  factors  in  normal,  hematuria  and  urolithiasis  groups  according  to 
cut-off  values 


Number  of  Positive  Risk  Factors 

Groups 

0 

1 

2 

3 

4 

5 

Normal  children 

5 

4 

5 

1 

0 

0 

(33.33%) 

(26.67%) 

(33.33%) 

(6.67%) 

Hematuria 

1 

5 

4 

9 

12 

5 

(2.78%) 

(13.89%) 

(11.11%) 

(25.00%) 

(33.33%) 

(13.89%) 

Urolithiasis 

0 

0 

0 

1 

10 

19 

(3.33%) 

(33.33%) 

(63.33%) 
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Abstract 

The  algebra  of  the  s-  and  r-vector.<  i.s  an  adequate  formal  tool  to  describe  chemical  objects  in  an 
abstract  way.  Compouncb  as  well  as  reactions  are  represented  including  ail  constitutional  and 
configurational  aspects.  The  .stereochemistry  of  simple  organic  molecules  as  well  as  of  metal- 
organic  compounds  may  be  described  in  a  unique  way.  Ionic  bonds,  covalent  bonds,  aromatics 
and  electron  deficiency  compounds  can  be  formally  described  without  loss  of  information. 
Even  reaction  types  and  the  flow  of  electrons  can  be  described  using  this  algebra.  The  biggest 
benefit  of  this  approach  is  its  intrinsic  group  theoretical  structure.  This  does  not  bother  the 
chemist  for  its  use  but  allows  the  computer  to  handle  and  structure  huge  amounts  of  chemical 
data.  This  is  especially  important  for  combinatorial  chemistry. 

Usually,  classical  chemical  synthe.ses  from  n  starting  materials  require  sequences  of  at  least 
n-1  preparation  steps  including  separation  and  purification  of  the  intermediates.  A  perfect 
alternative  for  rapid  synthe.ses  of  large  varieties  of  agrochemically  and  pharmaceutically 
relevant  products  are  one-pot  syntheses  by  multicomponent  reactions  (MCR).  Four  to  seven 
different  types  of  participants  (i.  e.  different  isocyanides,  amines,  etc.)  mi.xed  in  a  reaction 
vessel  undergo  the  transformation  to  one  molecule.  Using  more  than  one  representative  of  each 
type  of  starting  materials,  all  po.ssible  combinations  will  lead  to  a  molecular  library  of  products 
formed  according  to  the  given  reaction  scheme.  This  is  a  preposition  for  finding  compounds 
with  desired  properties. 

The  main  efforts  are  to  develop  procedures  that  lead  to  the  optimal  compound  with  specific 
properties,  i.  e.  methods  how  the  optimal  compound  with  best  effects  and  least  side  effects  can 
be  found. 

The  design  of  library  syntheses  and  the  handling  of  the  results  require  adequate  mathematics 
and  computer  tools.  For  designing  molecular  libraries  containing  the  sought  compound  one 
needs  other  information  than  for  classical  synthesis  planning.  How  many  different  compounds 
will  the  library  contain?  How  similar  or  different  are  these?  Will  the  compounds  show 
functional  groups  in  similar  .spatial  arrangements?  The  basic  problem  is  the  management  of  the 
flood  of  information  which  is  generated.  The  combinatorial  product  space  based  on  MCR 
approaches  contains  by  magnitude  more  structures  than  all  e.Kisting  structure  databa.ses 
together.  Using  a  combination  of  two  Ugi-four-component  reactions  (involving  di-carboxylic 
acids)  the  available  product  space  will  cover  lO'"^  different  structures  from  500  different 
starting  materials.  Such  vast  numbers  of  data  can  not  be  handled  by  usual  database  systems.  At 
present  the  automated  syntheses  of  these  compounds  would  take  thousands  of  years. 

Data  may  not  be  assigned  to  single  structures  but  to  sets  of  structures,  determined  by  the 
starting  compounds  and  the  resulting  reactions.  Instead  of  cornparing  single  structures  one 
must  compare  collections  without  having  the  individual  elements.  Individuals  can  not  be 
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Fig.  2:  Two  Ugi-4CR:  one  with  a  carboxylic  acid,  the  other  with  water  as  an  acid  component 

There  are  some  requirements  so  that  multicomponent  chemistry  works  for  combinatorial 
synthesis.  Beside  the  typical  e.ssentials  like  yields  and  selectivity,  the  MCR  must  operate  for  a 
wide  variety  of  representatives  of  the  educt  classes,  i.  e.  the  MCR  must  be  generic,  i.  e.  most  of 
the  starting  materials  will  react  according  to  the  given  reaction  scheme.  This  is  usually  given,  if 
the  single  reaction  steps  follow  a  certain  procedure:  Some  of  the  reaction  steps  between 
starting  materials  and  intermediates  equilibrate,  while  the  final  steps,  which  proceed  towards 
the  desired  product,  are  irreversible  (Fig.  3).  This  type  is  called  a  type-II-MCR.  Usually 
quantitative  yields  and  an  immensely  large  variability  of  starting  materials  and  products 
result.[7] 


a 


a-b 


a-b 


a-b  -d 


Fig.  3;  Type-ll-.VICR  have  a  final  quasi  irreversible  step 


All  the  side-reactions  must  also  be  reversible.  In  the  instance  of  the  Ugi-4CR  this  is  the  case 
for  the  Hellmann-Opitz-3CR  but  not  for  the  Passerini-3CR.  The  latter  must  be  excluded  by 
optimizing  the  reaction  conditions  towards  the  Ugi-4CR  which  in  this  case  is  easy  to  achieve 
by  the  solvent. 

1.2  MCR  in  Combinatorial  Practice 

In  1961,  when  Ugi  suggested  to  use  MCR  in  a  combinatorial  way  to  produce  molecular 
libraries[6].  automated  techniques  for  synthesis  or  screening  were  not  developed  yet.  Then  the 
paradigm  in  preparative  organic  chemistry  was  the  syntheses  of  pure  compounds.  In  the  early 
nineties  this  paradigm  shifted  as  there  emerged  a  need  for  more  compounds  to  match  the 
improved  and  accelerated  screening  capabilities. 

The  combinatorial  multicomponent  chemistry  gives  a  well-known  degree  of  diversity  and  a 
high  number  of  compounds.[4][8]  The  size  of  the  molecular  space  M  made  up  by  a  nCR  is 
defined  by  the  size  of  the  n  educt  classes  «],  rn _ ,n^: 

isi=n-.,  (1) 

1=1 


In  fact  the  Ugi-4CR  produces  a  stereo-centre  at  the  carbonyl-C  if  non-symmetric  aldehydes 
(formaldehyde)  or  ketones  are  used.  Considering  the  available  starting  materials  (mostly  found 
in  the  catalogues  of  leading  chemical  manufacturers)  there  are  about  lO'"^  different 
combinations.  This  set  of  combinations  is  called  the  Ugi-4CR  product  space.  Any  molecular 
Ugi-4CR  library  is  a  sub.set  of  this  product  space. 
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obtained  by  selection  from  an  intractable  set  but  by  selective  generation.  Such  highly  efficient 
methods  require  a  cleverly  thought-out  representation  of  the  chemical  objects  compound  and 
reaction,  as  it  is  given  by  the  algebra  of  the  5-  and  r-vectors. 

Originally  designed  for  describing  the  stereochemistry  of  chemical  reactions  via  permutational 
isomerism,  the  algebra  of  the  5-  and  r-vectors  is  useful  to  mapage  molecular  libraries  as  well.  It 
covers  all  combinatorial,  constitutional,  stereochemical  and  topological  aspects  of 
combinatorial  chemistry  on  a  profound  mathematical  basis.  The  little  mathematics  used, 
mostly  group  theory,  will  be  explained  and  illustrated  with  chemical  examples. 

1  Principles  of  MCR-Based  Combinatorial  Chemistry 

Combinatorial  chemistry[l][2][3]  is  a  rather  simple  technique  to  reach  many  different 
compounds  within  a  short  period  of  time.  Basically  there  are  two  different  approaches:  One 
approach  is  the  construction  of  chains  of  molecules  with  building  blocks,  usually  all  of  the 
same  kind,  for  example  peptides  out  of  amino  acids.  This  must  be  done  step-wise  to  get  well- 
defined  sequences.  Multicomponent  reactions  (MCR),  however,  differ  from  this  principle  due 
to  the  construction  plan  that  is  inherent  to  them.[4]  The  sequence  is  always  determined  by  the 
reaction  scheme.  No  sophisticated  procedures  are  needed.  All  combinations  of  the  starting 
materials  according  to  the  reaction  scheme  are  possible  (Fig.  l)The  set  of  combinations  is 
called  a  molecular  library. 


#ov^i 


Fig.  I:  Scheme  of  the  combinatorial  construction  principle  of  a  molecular  library  using  a  four-component  reaction 

Using  combinatorial  methods  also  means  that  you  obtain  a  lot  of  information  within  a  short 
period  of  time.  In  order  to  reduce  the  costs  for  information  management  computers  should  be 
helpful. 

1.1  Multicomponent  Reactions  for  Combinatorial  Chemistry 

From  the  methodological  point  of  view  MCR  are  the  kind  of  chemistry  that  is  best  fitting  for 
combinatorial  synthesis:  MCR  may  be  carried  out  on  solid  or  in  liquid  phase  by  so-called 
“Eintopf-Reaktionen”  (one-pot  reactions).  In  the  latter  approach,  the  starting  materials  are 
simply  mixed  and  the  program  for  combining  the  starting  materials  is  given  by  the  chemistry 
of  the  MCR.  For  example,  the  Ugi-four-component-reaction  (Ugi-4CR)  (Fig.  2)  is  combining 
an  amine,  an  aldehyde  (or  a  ketone),  a  carboxylic  acid  (or  other  proton  donors)  and  an 
isocyanide.[5][61  These  are  called  the  educt  classes. 
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Fig.  2:  Two  Ugi-4CR:  one  with  a  carboxylic  acid,  the  other  with  water  as  an  acid  component 


There  are  some  requirements  so  that  multicomponent  chemistry  works  for  combinatorial 
synthesis.  Beside  the  typical  essentials  like  yields  and  selectivity,  the  MCR  mu.st operate  fora 
wide  variety  of  representatives  of  the  educt  classes,  i.  e.  the  MCR  must  be  generic,  i.  e.  most  of 
the  stoning  materials  will  react  according  to  the  given  reaction  scheme.  This  is  usually  given,  if 
the  single  reaction  steps  follow  a  certain  procedure:  Some  of  the  reaction  steps  between 
starting  materials  and  intermediates  equilibrate,  while  the  final  steps,  which  proceed  towards 
the  desired  product,  are  irreversible  (Fig.  3).  This  type  is  called  a  type-II-MCR.  Usually 
quantitative  yields  and  an  immensely  large  variability  of  starting  materials  and  products 
result.[7] 


a  a-b  a-b  c  — -  ^  »■  a-b  -d 

Fig.  3:  Type-II-MCR  have  a  final  quasi  irreversible  step 

All  the  side-reactions  must  also  be  reversible.  In  the  instance  of  the  Ugi-4CR  this  is  the  case 
for  the  Hellmann-Opitz-3CR  but  not  for  the  Passerini-3CR.  The  latter  must  be  excluded  by 
optimizing  the  reaction  conditions  towards  the  Ugi-4CR  which  in  this  case  is  easy  to  achieve 
by  the  solvent. 


1.2  MCR  in  Combinatorial  Practice 

In  1961.  when  Ugi  suggested  to  use  MCR  in  a  combinatorial  way  to  produce  molecular 
libraries[6].  automated  techniques  for  .synthesis  or  screening  were  not  developed  yet.  Then  the 
paradigm  in  preparative  organic  chemistry  was  the  syntheses  of  pure  compounds.  In  the  early 
nineties  this  paradigm  shifted  as  there  emerged  a  need  for  more  compounds  to  match  the 
improved  and  accelerated  screening  capabilities. 

The  combinatorial  multicomponent  chemistry  gives  a  well-known  degree  of  diversity  and  a 
high  number  of  compounds.[4][8]  The  size  of  the  molecular  space  M  made  up  by  a  nCR  is 
defined  by  the  size  of  the  n  educt  classes  rij,  «2, ..., 

ifii=n«.  <'> 

1=1 

In  fact  the  Ugi-4CR  produces  a  stereo-centre  at  the  carbonyl-C  if  non-symmetric  aldehydes 
(formaldehyde)  or  ketones  are  used.  Considering  the  available  starting  materials  (mostly  found 
in  the  catalogues  of  leading  chemical  manufacturers)  there  are  about  lO'"^  different 
combinations.  This  set  of  combinations  is  called  the  Ugi-4CR  product  space.  Any  molecular 
Ugi-4CR  library  is  a  subset  of  this  product  space. 
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A  very  suitable  device  for  the  automated  synthesis  of  MCR  libraries  are  micro  litre  plates 
(Fig.  4).  These  are  available  in  different  densities  up  to  some  30(X)  wells  per  plate.  Figure  4 
shows  how  a  8x12  micro  litre  plate  can  be  used  for  the  synthesis  of  a  96-library  by  mixing  4 
aldehydes,  3  amines.  4  carboxylic  acids  and  2  isocyanides.  Each  well  contains  an  unique 
mixture  from  which  an  Ugi-4CR  product  may  evolve.  This  is  defined  by  a  2-dimensional 
location.  The  mixture  (A2,  Bl,  Cl,  D2)  will  be  positioned  al  (4,  5). 


(4.5)=(A2,Bl.Ci.D2)  A1  A2  A3  A4 


Fig.  4:  Distribution  of  13  dilTercnt  .starting  materials  on  a  micro  titre  plate  in  order  to  produce  a  96-library. 

Figure  5  .shows  an  array  of  micro  titre  plates  necessary  for  the  .synthesis  of  a  960’P00-library 
emerging  from  40  aldehydes,  30  amines,  40  carbo.xylic  acids  and  20  isocyanides.  This  is  still 
far  away  from  the  possibilities  of  the  Ugi-4CR. 

40  x  30  =  1200 


t  i  t  t  M  t  ::  I  t  t  t  I  r.t-i i.r:. 


Fig.  3:  An  array  of  micro  titre  plates  ncces.<ar\'  for  a  960'0(X)  library. 

To  use  the  high  density  litre  plates  is  not  the  solution  of  the  problem  because  this  reduces  the 
problem  only  by  a  factor  of  10~.  To  give  an  impression  of  the  size  of  the  problem;  Assume  each 
well  will  be  filled  with  Ipg  starting  materials.  Further  assume  that — at  the  present  state  of  the 
art —  an  automaton  fills  each  titre  plate  w'ithin  1  second.  Then,  the  complete  Ugi-4CR  product 
space  with  lO'"^  different  combinations  w-ill  consume  1(X)  tons  of  starting  materials  within 
33'000  years. 
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1.3  Need  for  Computers 

To  find  the  best  matching  compound  concerning  sought  properties  out  of  the  available  product 
spaces  is  an  optimization  problem,  because  there  is  no  algorithm  mapping  properties  onto  a 
(set  oO  compound(s).  For  the  designer  of  a  molecular  library  all  intractability  of  this  problem 
results  out  of  the  size  of  the  product  spaces.  No  matter  if  selecting  starting  materials 
corresponding  with  a  given  specification,  or  specifying  a  sublibrary,  or  optimizing  a  lead,  one 
always  has  to  face  the  enormous  molecular  space  one  works  within.  It  is  not  possible  to 
produce  the  whole  molecular  space  and  then  select  the  interesting  structures.  No  computer 
could  store  nor  investigate  that  much  information.  The  information  must  be  generated 
selectively.  Answers  mu.st  be  found  on  the  level  of  the  starting  compounds  and  construction 
principles  that  are  given  by  the  chosen  multicomponent  chemistry. 

Effective  and  efficient  representations  of  the  chemical  objects  are  needed  for  computer  support. 
As  .so  often  in  chemistry,  group  theory  helps  a  lot. 

1.4  Formal  Basis 

There  is  little  mathematics  needed  for  a  formal  representation  of  combinatorial  chemistry.  The 
complete  mathematical  discourse  on  this  topic  is  given  in  the  mentioned  literature.  This 
chapter  gives  all  necessaiy-  information  without  proofs  but  explains  the  most  important 
mathematical  correlations. 

1.4.1  Permutations 

A  permutation  K  =  {1^  If,  Ic  fj)  is  a  mapping  from  the  set  Z.  /^  /^. ...  /y}  onto  L. 


k:L->  L 
K  bijective 

according  to  k  (Z^)  =  4,  k  {l(,)  =  1^., ...,  7t  (/y)  =  1^.  All  further  elements  are  mapped  to 
themselves.  This  mapping  may  also  be  noted  as  a  vector 


(2) 


K  = 


(3) 


A  composition  of  two  permutations  =  Ttitt]  applies  first  rC|  and  then  Ki-  If  two 
permutations  are  disjoint  (that  means  they  have  no  elements  in  common)  the  sequence  of 
application  has  no  effect,  i.  e.  ( I  5)(3  4)  =  (3  4){  1  5).  Also  the  element  a  cycle  starts  with 
makes  no  difference,  i.  e.  (1  5  6)  =  (5  6  1).  but  naturally  the  sequence  within  the  cycle  does, 
i.  e.  ( I  5  6)  (I  6  5)!  Non-disjoint  permutations  can  be  “normalized”  so  that  the  cycles 
become  disjoint,  i.  e.  ( I  5)(  I  5  6)  can  be  transformed  to  (5  6). 


1 .4.2  Automorphisms  and  Groups  for  Structuring  Chemical  Information 
A  group  G  =  (L,  •)  is  a  pair  of  a  set  L  and  an  operation  •  in  L  with  the  following  properties: 

( 1 )  Closure:  For  any  two  elements  a.  j3  e  L,  the  product  a-P  is  also  element  of  L\ 

(2)  Associativity:  For  all  elements  a,  p.  ye  L,  we  have  afPy)  =  (aP)-7: 

(3)  Existence  of  an  identity  element.  There  exists  an  element  e  6  L  such  that  for  all  ele- 
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ments  a  g  L  holds  a  e  =  e  a  -  a; 

(4)  Existence  of  inverse  elements:  For  any  element  a  g  L  there  exists  an  element  a~*  such 
that  a  ct"'  =  a~'-a  =  £. 

This  paper  is  exclusively  talking  about  finite  groups,  i.  e.  groups  with  a  finite  set  of  elements. 

Each  U  -  {M,  •).  is  called  a  subgroup  of  G,  written  [/<G,  if  U  is  closed  concerning  the 
operation  •,  Groups  (and  subgroups)  are  usually  defined  by  a  set  of  generators  S.  All  possible 
combinations  of  elements  of  set  S  will  give  the  group  <S>. 

For  example,  the  set  of  permutations  L  =  {(),  (I  2  3),  (1  3  2),  (1  2)(4  5),  (1  3)(4  5),  (2  3)(4  5)} 
with  the  composition  of  permutations  o  forms  a  group  C  =  {L  o).  The  set 
Af  =  {(),  (1  2  3),  (I  3  2)}  forms  a  subgroup  U  =  (M.  o)<C. The  set 5=  {(1  2),  (I  2  34  5)) 
generates  the  group  2^  that  covers  all  120  possible  permutations  of  up  to  five  elements.  G  as 
well  as  U  are  subgroups  of  25. 

The  group-theoretic  notion  coset  plays  an  important  role.  A  coset  space  of  G  is  generated  by  a 
subgroup  U<G.  For  any  ocg  G  the  set  a-V  -txU  is  called  a  left-coset  of  U  in  G.  The  left-coset 
aV  can  be  described  as  aU  =  {7G  G:  a"’Y^  section  2  will  be  shown  that  each  coset 
represents  exactly  one  product  of  a  library.  The  index  [G:U]  of  (Jin  G  is  the  number  of  distinct 
(left-)cosets.  All  the  left-cosets  of  t/  in  C  have  the  same  size.  Because  the  identity  element  is 
member  of  G,  U  itself  is  a  left-coset.  Therefore  all  cosets  of  C/  in  G  have  the  same  size  as  U. 
Each  ye  G  belongs  to  at  least  one  left-coset  of  G  in  G  and  any  two  left-cosets  of  G  in  G  are 
either  identical  or  disjoint.  Therefore  G  is  a  disjoint  union  of  all  the  cosets  of  G  in  G.  The  index 
[G:G]  is  equal  to  IGI  /  IGl.  Each  coset  is  uniquely  represented  by  any  of  its  members  together 
with  the  subgroup  G.  A  set  of  one  representative  per  coset  is  called  a  travese. 

Groups  are  capable  of  structuring  sets  (of  molecules)  by  making  up  a  coset  space.  In  the 
example  above  two  permutations  define  a  set  of  120,  a  small  subset  that  generate  a  group 
structures  these  120  elements  into  cosets.  This  is  why  group  theory  is  of  high  value  for 
structuring  big  sets. 

This  approach  has  been  initiated  by  Dugundji  and  Ugi  in  1984  and  refined  by  Dietz,  Gruber 
and  Ugi.[l  1][12][13](14]  Several  kinds  of  equivalency  relations  (for  example  chemical 
identity)  have  been  applied  to  structure  chemical  information. 

I . 4.3  Automorphisms  for  Molecules  and  Reactions 

Groups  based  on  permutations  are  called  automorphism  groups.  Automorphism  groups  are 
extremely  useful,  not  only  to  describe  stereochemistry  of  molecules,  but  also  for  constitutional 
phenomena,  so-called  chemical  reactions.  In  1971  Ugi  and  Dugundji  represented  the 
constitutional  part  of  chemistry  by  redistribution  of  binding  electrons  formalized  as  the 
algebra  of  the  be-  and  r-matrices.[l5]  In  1992  an  algebraic  model  of  stereochemistry  was 
given,  involving  stereochemistry  into  chemical  reactions  and  vice  versa.  Section  3.7  of  [16]  is 
dealing  with  the  fusion  of  the  theory  of  the  chemical  identity  group  with  the  algebra  of  the  be- 
and  r-matrices.  The  result  was  the  algebra  ofs-  and  r-vecfor5.[12][13][16][17]  ^stands  for 
stereochemical  and  r  for  reaction.  The  derivation  of  this  algebra  is  also  given  in  Ref.  [I3]pp9- 

I I.  The  algebra  of and  r-vectors  is  capable  of  describing  structures  and  reactions  regarding 
stereochemistry  [13],  delocalized  electron  systems  and  electron  deficient  compounds  [17]. 

Each  atom  has  located  topological  positions  that  function  as  connection  points.  A  sp^-carbon 
possesses  typically  four  topological  positions.  The  positions  are  called  topological  but  not 
geometric  because  their  geometric  location  is  not  known  but  their  neighbourhood  is. 
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Chemical  structures  are  represented  by  one-to-one  mappings  of  the  topological  positions  of 
(different)  atoms.  This  is  isomorphic  to  permuting  a  finite  set  of  ligands,  but  unlike  the  theory 
of  the  chemical  identity  group  there  is  no  distinction  between  atoms  of  the  ligands  and  atoms 
of  the  skeleton  any  more.  Any  (set  of)  molecule(s)  is  represented  by  a  permutation.  Therefore 
chemical  reactions  are  simply  permutations  applied  onto  permutations. 


Fig.  6  shows  the  constitutional  changes  of  the  Ugi-4CR  which  is  represented  by  the  matrix  R. 
The  matrix  E  represents  the  staning  materials:  an  aldehyde,  an  amine,  a  carboxylic  acid  and  an 
isocyanide.  The  Ugi-product  and  one  molecule  of  water  are  repre.sented  by  the  matrix  B. 


Fig.  6:  Ugi-4CR  represented  as  fcc-&r-matrices 
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The  equation  B  =  R+E  is  fundamental  to  the  algebra  of  the  ^e-and  r-matrices.  The  matrices 
exclusively  have  information  about  the  distribution  of  the  binding  electrons,  whereas  the  r-  and 
^-vectors  are  representations  of  the  topological  positions  on  the  atoms  (Fig.  7).[I2]  The  first 
line  is  needed  as  a  reference  what  topological  positions  are  mapped.  The  j-vectore  is  defining 
the  starting  materials,  p  is  the  Ugi-4CR  r-vector.  The  ^-vector  p  results  from  the  application  of 
p  onto  e:  P  =  poE.  Fig.  7  shows  the  Ugi-4CR  with  the  topolo'gical  positions  and  two  different 
notations  for  the  permutations  e,  p  and  p;  as  vectors  and  as  cycles  (s.  section  1.4.1). 
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Fig.  7:  The  Ugi-4CR  noted  in  terms  of  s-  and  r-veciors 

Molecules  (or  ensembles  of  them)  correspond  with  permutations.  Families  of  permutational 
isomers  correspond  with  automorphism  groups.  The  algebra  of  s-  and  r-vectors  allows  to 
expand  this  approach  onto  sets  of  “isomeric”  ensembles  of  molecules. 


2  A  Precise  Representation  for  Molecular  Libraries 

As  well  as  automorphism  groups  are  useful  for  representing  families  of  permutational  isomers 
in  an  efficient  way,  they  can  represent  molecular  libraries.[  16][  19]  Molecular  libraries  are  used 
in  combinatorial  chemistry  for  finding  and  optimizing  new  drugs.  There  is  a  big  space  of 
molecules  corresponding  with  each  MCR.  Libraries  are  subsets  of  this  space.  In  order  to 
decide  u/iaf  libraries  are  most  expecting  one  must  use  tools  that  determine  these  sets. 

Due  to  its  size  the  entire  molecular  space  of  an  MCR  can  not  be  produced  as  compounds,  nor 
can  it  be  represented  as  a  list.  Group-theory  is  capable  of  representing  a  set  of  molecules  by  a 
subset  (a  so-called  set  of  generators)  and  a  construction  principle.  This  saves  memory  and 
reduces  the  exponential  complexity  to  quadratic  complexity,  which  is  important  for  both 
computers  and  the  designer  of  molecular  libraries. 

Notions  of  group  theory'  have  chemical  interpretations.  For  example  a  group  may  define  an 
equivalence  relation.  That  means  the  members  of  the  group  are  equivalent  concerning  a 
defined  property,  for  example  they  belong  to  the  same  aromatic  system.  Subgroups  of  this 
group  may  refine  the  equivalence  relation.  As  for  example  the  subgroup  given  by  all 
permutations  applied  onto  molecular  pans  with  a  maximum  mass  of  150  daltons. 

2.1  Molecular  Libraries  represented  by  Notions  of  Group  Theory 

Each  /i-CR  leads  to  a  specific  backbone  with  n  topological  sites.  At  these  sites  different  ligands 
will  be  positioned  by  different  starting  materials.  Each  kind  of  starting  material  has  its  well- 
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defined  site  due  to  the  “deterministic”  reaction  mechanisms.  The  amines  in  the  Ugi-4CR 
(Fig.  2)  will  always  define  the  same  part  of  the  backbone,  etc.  By  transposing  the  subgraph 
defined  by  the  amine  with  another  amine  you  get  a  different  “permutational  isomer”  which  is  a 
member  of  the  library.  The  only  difference  between  families  of  permutational  isomers  and 

molecular  libraries  is  that  in  the  latter  are  permutations  applied  upon  extra -molecular  parts. 

* 

Fig.  7  shows  the  Ugi-4CR  in  terms  of  s-  and  r-vectors.  The  Ugi-4CR  (as  well  as  any  other  4CR 
of  type  II)  can  be  simplified  to  the  scheme  given  in  Fig.  3:  The  result  of  the  Ugi-4CR  of  educts 
a,  b,  c  and  d  will  be  a  molecule  a — b — c — d.  The  educts  must  be  of  the  appropriate  classes  A, 
B,  C.  D,  i.  e.  “a”  must  be  an  acid  component  (class  A),  “b"  an  amine  (class  B),  “c”  an  aldehyde 
or  ketone  (class  Q  and  ''cf'  an  isocyanide  (class  D). 

The  size  of  the  product  space  is  determined  by  the  size  of  the  educt  classes  Wl,  IBI,  IG  and  \D\. 
Assume  that  the  commercially  available  number  of  educts  are  like  L4I  =  200,  IBI  =  200, 

IG  =  150  and  IDI  =  20.  According  to  formula  ( 1)  the  resulting  product  space  M  will  cover 
\M\  =  Wl-IBMG-IDI  =  200- 200- 1 50-20  =  1 .2-10*  different  prod^ucts. 

For  a  construction  principle  for  Ugi-4CR  libraries,  let  the  members  of  the  educt  classes  be 

identified  by  a^ . fl20()’  •••’  ^200-  <^‘i'  •••>  Furthermore  let  the  sequence 

m  =  «! — — Cl — d\  represent  a  specific  Ugi-4CR  reference  product.  Then  ^  =  {(a|  ^2). 

(fl|  a-y ...  ^200)-  (^1  ^2)’  h  •••  ^200)-  (^1  ‘•'2)-  (^1  <^2  •••  C!50^'  (^1  ^2)’  (^1  ^2  •••  ^20)}  is  a  set 
of  generators  of  the  group  S  =  <y>,  that  covers  all  permutations  concerning  members  of  the 
same  class.  Such  permutations  are  for  example  a  =  (a|  a-^,  as  well  as  p  =  (O]  fl5o)(^i  bi)  and 
Y=  {^2  ^3)'  ^  =  (^*1  ^1)  no*"  P  =  ^2)^^!  ^2)  because  the  molecules  aow  =  a^ — b^ — 

C| — d[,  Po7«  =  ^50 — b2 — C| — f/|  as  well  as  yom  =  a\ — — C| — d^  (=  m)  are  valid  in  the  sense 
that  the  position  “a"  is  always  kept  by  a  member  of  class  A,  “i?”  by  B  and  so  on.  This  is 
definitely  not  true  for  itom  =  /i| — a^ — Cj — d\  nor  for  pom  =  ^2 — ^2 — <^1 — ^l- 

As  Y  shows,  there  are  permutations  that  are  valid  but  do  not  affect  the  molecule.  These 
permutations  transpose  members  “within”  the  class  itself.  They  are  defined  by  the  group 

r=  <{(«2  ay),  (aj  ay  ...  n200^’  (^2  ^.3)'  (^2  ^3  •••  ^200)’  (^2  ^3^’  (^2  ^^3  •••  ^i5o)>  (^2  ^3)* 

(^2  dy  ...  f/2o)}>.  i-  e.  the  elements  Oi,  b[,  C)  and  di  are  not  involved. 

Each  left-coset  ctoT of  T  in  S  corresponds  with  exactly  one  molecule  of  the  library.  This  means 
the  library  is  defined  by  the  left  coset  space  A/  =  {aT  1  as 5}.  The  size  of  M  is  given  by 

lA/l  =  ISI  /  m 

=  (UI!-IBI!-1G!-1D1!)  /  ((1AI-1)!-(IBI-1)!-(IG-1)!-(IDI-1)!) 

=  1AIIBI-1G-IZ)I 

=  1.2-10^ 

A  molecular  library  is  defined  by  the  coset  space  or  any  corresponding  traverse  (section  1.4.2). 

2.2  Costs 

The  computer  does  not  calculate  all  the  elements  of  a  group.  Instead  a  so-called  representation 
matrix  is  built.[16][i8]  This  is  an  upper  triangle  matrix  of  size  nxn  where  n  is  the  number  of 
elements  to  permute,  in  this  case  the  number  of  educts.  The  process  to  construct  a 
representation  matrix  runs  with  quadratic  time-  and  space-complexity  respective  the  number  n 
of  elements  to  permute. 
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In  the  given  example  of  the  Ugi-4CR  product  space  there  are  IAI+15I+1G+IDI  =  570  educts  to 
distribute  on  the  backbone  which  corresponds  with  a  subset  of  ®57o.  the  set  of  all  possible 
permutations  on  570  objects. 

To  store  3570  as  a  representation  matrix  you  need  162. 165  permutations  instead  of  570! 
(^loiooo^  The  representation  matrix  of  S<Z^^o  has  51.140  jlefined  entries  and  T<S  is  only 
50.599.  Using  this  concept  one  can  do  membership  tests  with  quadratic  complexity  and  can 
easily  construct  subgroups  of  already  represented  groups. 


3  Applications  of  this  Approach 

In  order  to  give  an  idea  how  the  representation  matrices  look  like  and  how  the  concept  may  be 
applied,  a  rather  small  example  is  given.  It  bases  upon  an  Ugi-4CR  library  each  class  having  5 
members  (5-5-5-5-Ugi-4CR  library).  Bigger  examples  can  easily  be  handled  by  computers  but 
are  not  of  the  right  size  to  be  printed. 

3.1  The  Management  of  a  5‘5*5*5-Ugi-4CR  Library 

The  following  example  uses  the  Ugi-4CR  with  5  compounds  of  each  class.  The  5  carboxylic 
acids  are  responsible  for  filling  the  part  of  the  Ugi-4CR-backbone  (Fig.  7)  with  the 
molecular  rests  1,  2.  3, 4  and  17.  The  numbers  are  arbitrary  but  unique.  The  5  aldehydes  are 
filling  part  of  the  Ugi-4CR-backbone  with  the  molecular  rests  5. 6,1,  8  and  18.  The  5 
amines  are  filling  part  /?,.  of  the  Ugi-4CR-backbone  with  the  molecular  rests  9,  10,  1 1,  12  and 
1 9.  And  the  5  isocyanides  are  filling  part  R^i  of  the  Ugi-4CR-backbone  with  the  molecular  rests 
13,  14,  15,  16  and  20.  One  expects  5"^  =  625  different  products. 

Each  of  the  resulting  Ugi-products  is  represented  by  a  sequence  a — b — c — d,  where  n  e  {1,2, 
3,4,  17h^e  {5,6,‘'7,8.  18}.  ce  {9,  10,  11,  12.  19}andr/e  (13.  14.  15.  16, 20).  One  of  the 
products  is  given  by  the  sequence  17—18—19—20.  This  shall  be  the  reference  product.  The 
other  products  can  be  received  by  replacing  17  by  1,  2.  3  or  4,  replacing  18  by  5,  6, 7  or  8,  etc. 
The  resulting  set  of  permutations  is  g  -  ((17  4),  (17  3),  (17  2),  (17  1),  (18  8),  (18  7),  (18  6), 
(18  5),  (19  12).  (19  1 1),  (19  10).  (19  9),  (20  16),  (20  15).  (20  14),  (20  13)}  and  all  their 

combinations. 

By  combining  the  permutations  of  set  g  you  can  describe  all  the  different  products  of  the 
library.  Unnecessarily,  many  of  the  permutations  represent  the  same  product,  because  some  of 
the  combinations  result  in  permutations  that  do  not  affect  the  reference  product.  As  for 
example  the  permutation  ( 1  2)  that  results  from  the  combination  ( 1  17)o(2  17)o(l  17).  These 
duplicates  may  be  handled  mathematically:  The  set  of  all  combinations  of  permutations  of  set 
g  forms  the  group  G  =  <g>.  G  has  5!‘*  =  207’360‘000  members.  The  subgroup  U<G  that  results 
from  all  possible  combinations  that  do  not  affect  the  reference  isomer  is  generated  by 
C/  =  <{(  I  2),  (2  3),  (3  4),  (5  6).  (6  7),  (7  8),  (9  10),  (10  1 1).  (1 1  12),  (13  14),  (14  15), 

(15  16)}>.  Uhas4!'^  =  33r776  members. 

In  the  above  example  the  index  of  U  in  G  is  ICI  / 1(71  =  5!^^  /  4!^  =  5^  =  625.  The  left-coset  space 
of  U  in  G  is  the  family  of  sets  { yt/ 1  ye  G}.  For  any  ye  G  the  left-coset  yG  describes  exactly  one 
member  of  the  library'.  For  example  the  left-coset  (I  I7)(5  6)(12  1 9)f/ contains  the  permutation 
(I  17)(12  19),  so  it  represents  the  product  I — 18 — 12 — 20.  There  is  no  other  product 
represented  by  this  left-co.set  and  no  other  left-coset  representing  the  product.  Any  traverse  of 
the  coset  space  defines  the  complete  library.  Naturally  not  the  traverse  is  calculated  but  the 
representation  matrices  of  G  and  IJ  (Figures  8  and  9). 
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3.2  Program  behaviour 

The  group  G  is  generated  by  the  set  of  permutations  ^  =  {(17  4).  (17  3),  (17  2),  (17  1).  (18  8) 
(18  7).  (18  6),  (18  5),  (19  12).  (19  1 1),  (19  10),  (19  9),  (20  16),  (20  15).  (20  14).  (20  13)}.  The 
representation  matrix  of  group  G  is  of  dimension  20: 
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Fig.  8:  Croup  G  generated  by  16  permutations  contains  all  permutations  corresponding  with  the  library 

The  members  of  the  group  are  found  by  composing  the  permutations  of  the  representation 
matrix,  taking  e.xactly  one  permutation  of  each  row.  Each  combination  will  give  an  unicjue 
member.  Consequently,  the  size  of  the  group  can  be  calculated  by  multiplying  the  numbers  of 
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defined  matrix  entries  per  row:  5*4-3'2'I-5-4-3’2'l’5-4-3'2-l*5-4-3*2'l*  =  5!“*  =  207'360’000. 
The  Pascal-implementation  on  an  Apple  Macintosh  8500/180  needs  less  than  one  second  to 
calculate  the  representation  matrix  out  of  the  set  of  generators  g. 
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Fia.  9:  Croup  U  generated  by  1 2  permutations 


The  set  of  generators  of  group  U<G  is  {(2  1),  (3  2),  (4  3),  (6  5),  (7  6),  (8  7),  (10  9),  (1 1  10), 
(12  11),  (14  13),  (15  14),  (16  15)}.  The  corresponding  representation  matrix  is  of  dimension 
16.  The  group  U  has  4'3"2*l-4-3-2*  l-4-3-2‘l*4-3-2*l'  =  4!"*  =  33 1 ’776  members. 

4  Discussion 

Molecules  (or  ensembles  of  them)  correspond  with  permutations.  Families  of  permutational 
isomers  correspond  with  automorphism  groups.  The  algebra  of  s~  and  r-vectors  allows  to 
expand  this  approach  onto  sets  of  “isomeric”  ensembles  of  molecules.  In  this  sense  MCR 
libraries  are  isomeric  to  the  set  of  educts.  Thereby  automorphism  groups  are  useful  for 
representing  families  of  permutational  isomers  as  well  as  molecular  libraries  or  any  other 
structured  sets  of  molecules  in  an  efficient  way. 

The  structuring  properties  of  group  theory  are  useful  for  the  efficient  storage  of  chemical  data. 
The  approach  works  for  managing  the  data  generated  by  combinatorial  chemistiy.  Chemical 
properties  correspond  with  group-theoretic  structures  like  cosets  or  subgroups.  This  structures 
sets  of  molecules  in  a  hierarchical  manner.  Thereby  automorphism  groups  are  useful  for 
representing  families  of  permutational  isomers  as  well  as  molecular  libraries  or  any  other 
structured  sets  of  objects  in  an  efficient  way. 
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ABSTRACT 

The  set  of  coset  representations  ,  CR’s  ,  of  a  group  G  , 
{G(/G0  ,  G(/G2)  , ... ,  G{/Gs)},  Where  Gi  ={1}  ,G,  =G ;  the  marks  ,  mij  of  subgroup 
Gj  on  a  given  G(/Gi)  ,  1  <  i  <  s  ,  and  the  subduction  of  G(/Gi)  byGjJ<i, 
G(/Gi)  Gj  ,  are  essential  tools  for  the  enumeration  of  stereo  isomers  and  their 
classification  according  to  their  subgroup  symmetry  [Fujita  ,S. ,  Symmetry  and 
Combinatorial  Enumeration  in  Chemistry  ,  Springer  -  Verlag ,  Berlin  1991].  In  this 
paper  ,  each  G(/Gi)  is  modelled  by  a  set  of  coloured  equivalent  configurations , 

=  {hi  ,  h2  . ... ,  hr}  ;  r  =  lGl/lGi| ,  (called  homomers) ,  such  that  a  given  homomer , 

hk  ,  remains  invariant  only  under  all  g  €  Gi  where  g  is  an  element  of  symmetry.  The 
resulting  homomers  generate  the  corresponding  set  ofmarks  almost  by  inspection. 
The  symmetry  relations  among  a  set  can  be  conveniently  stored  in  a  Cayley  -  like 

diagram  [Chartrand  ,  G,  Graphs  as  Mathematical  Models  ,Prindle  ,  Weber  and 
Schmidt  Incorporated,  Boston  ,  MA  ,  1977  ,  Chapter  10]  ,  which  is  a  complete 
digraph  on  r  vertices  so  that  an  arc  from  the  vertex  v;  to  the  vertex  vj  is  coloured  with 

the  set  Sij  of  symmetry  elements  such  that  hi  — — — ►  hj ;  gij  e  Sy . 

In  addition  ,  each  vertex  ,  Vj  is  associated  with  a  loop  which  is  coloured  with  a  set  Sy 
so  that  gij  e  S;;  stabilizes  h;. 

A  Cayley-  like  diagram  of  a  given  CR  ,  i^[G(/Gi)]  leads  to  graphical  generation  of 
G(/Gi)  'i'  Gj  for  all  values  ofj  and  also  to  all  my’s. 

Several  group-theoretical  results  are  rederived  and  /or  became  more  envisagable 
through  this  modelling.  The  approach  is  examplified  using  C2 ,  C3 ,  Dz ,  T  and  D3 
point  groups  and  is  applied  to  trishomocubane  ,  a  molecule  which  belongs  to  the  D3 
point  group. 
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l.Introduction  and  Background 

Suppose  a  parent  skeleton  which  belongs  to  a  given  point  group ,  G ,  to  be  subjected 
to  a  particular  substitution  pattern  leading  to  a  number  of  structures.  Let  Gi=Ci(={I}) 
,  G2 , ,  Gs ,  be  the  sequence  of  representative  subgroups  ^ ,  where  Gs  =  G.  One  may 
ask  ;  How  many  derivatives  are  there  which  belong  to  each  subgroup  ? 

In  the  past  few  years  Fujita  has  published  a  number  of  papers  in  which  he 
developed  powerful  group  -  theoretical  methods  which  answer  the  above  and  such 
questions.  While  conventional  treatments  consider  linear  representations  and 
character  tables  of  groups  Fujita’s  approach  is  based  on  coset  representations, 
CR’s  ,  and  table  of  marks.  ”  Namely,  each  element  of  symmetry,  g  g  G  applied  to  a 
coset  of  Gi;  1  <  i  <  s  gives  another  coset  and  thus  each  element  of  G  can  be 
considered  as  a  certain  perrriutation  of  the  cosets ,  leading  to  a  representation  of  G  in 
terms  of  these  permutations ,  called  coset  representations  ,  G(/Gi).  Formally  : 

G(/Gi)={Kg|V  gGG}; 


_  Jf  -  Gigr  ^ 

V  Gig2g  ...  Gigrg  J 

In  eqn  (2)  ,  r  =  |G|  /  |Gi| ,  |G|  =  order  of  G.  The  mark  my  of  Gj  on  G(/Gi) ,  Gj  being 
another  subgroup  of  G ,  is  the  number  of  cosets  left  invariant  (fixed)  by  Gj.  The  set  of 
vertices  of  the  molecular  graph  which  undergoes  substitution  is  called  the  orbit  of 
substitution.  Essential  to  the  treatment  of  Fujita  is  to  classify  the  orbit  which 
undergoes  substitution  according  to  the  CR  which  governs  its  substitution  This  orbit 
is  then  subduced  by  all  subgroups  of  the  parent  point  group  in  order  to  obtain  the 
required  structural  counts  which  are  expressed  in  the  so-called  isomer-count  matrix'. 
The  subduction  of  a  CR  ,  G(/Gi)  by  Gj  is  expressed  by  eqn(3)  which  is  related  to 
eqn(  1 )  by  just  attaching  the  subscript  j  to  G  in  braces  ,  viz.  . 

G(/Gi)  Gj  =  {  Ttg  V  I  g  G  Gj} 

The  basic  tools  here  (cosets  ,  marks  ,  subduction  tables  etc...  )  are  indeed  rather 
abstract  in  nature  and  may  not  draw  the  attention  of  an  organic  chemist  who  benefits 
most  from  the  results  of  this  algebra  Here  we  propose  a  graphical  modelling  of 
mark  and  subduction  tables  of  CR’s  using  “Cayley  -  like  colour  graphs”  *  (see 
below).  Due  to  their  diagrammatic  nature ,  graphs  are  more  appealing  to  chemists  and 


rigrg  J 


or  simply ,  Cayley  diagrams  .  for  breviw. 
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in  addition  the  suggested  graphical  model  rederives  many  of  the  group  theoretical 
results  which  are  conventionally  obtained  using  purely  algebraic  methods. 

2.  Modelling  Coset  Representations  of  Groups: 

First  a  molecular  graph  is  drawn  which  remains  fixed  under  all  of  the  symmetry 
operations  of  >t.ln  FIG  1  several  graphs  are  drawn  which  represent  models  of  some 

of  the  simpler  point  groups.  The  D2  model  is  a  tapered  conformation  of  ethylene  while 
D3  is  a  twisted  ethane  which  is  neither  staggered  nor  eclipsed.  The  T  point  group 
model  is  an  adamantane  molecule  graph.  Orbits  of  substitution  are  represented  by  the 
(open)  vertices  of  each  graph.  When  all  the  vertices  are  opened  the  model  is  said 
(here)  to  be  “uncoloured”  and  by  definition  such  a  model  represents  the  regular  coset 
representation,  G(/G),  since  it  remains  fixed  under  all  g  e  G.  For  the  other  CR’s , 
G(/Gi)  ;  1  <  i  <  s  one  is  dealing  with  less  symmetry  operations ,  viz. ,  only  those 
which  belong  to  G;  and  whence  a  particular  “colouring”  of  the  original  model  of 
G(/G)  is  adopted  so  that  the  coloured  graph  remains  fixed  only  under  Gj.  Arbitrarily 
the  number  of  black  ( ^closed)  vertices  is  chosen  to  be  a  minimum.  Further ,  one  must 
search  for  all  such  coloured  equivalent  configurations  which  reproduce  the 
permutation  properties  for  the  cosets  of  Gj.  The  chemical  term  for  equivalent 

configurations  is  homomers.  For  a  given  CR ,  G(/Gi)  this  set  will  be  denoted  as  K[ 
G(/Gi)  ]  where  : 

^[G(/Gi)]  =  ^={hi,h2 . h,}  ...(4) 

in  which  hi  is  an  i'*’  homomer  and  r  is  defined  by  eqn(2).  To  find  h\  e  K  [G(/Gi)]  we 

apply  all  g’s  e  Gi  and  express  the  result  in  cyclic  notation  such  as  (ab)(cdef)(...) 
where  the  letters  in  parentheses  refer  to  the  labels  of  vertices  of  the  orbit.  Since  hi 
remains  fixed  under  Gi  it  must  yield  the  following  “vertex-colour”  identities: 

a=b  ;  c=d=e=f , ...  (5) 

We  ,  then  select  just  one  such  equality  which  involves  the  smallest  number  of 
vertices  and  colour  them  in  black.  The  resulting  (coloured)  configuration  is 
(arbitrarily)  called  hi.  The  other  ,  (r-1)  ,  elements  of  the  set are  deduced  so  that 

they  generate  the  same  cyclic  structures  of  the  permutations  representing  G(/Gi).  In 
Appendix  1,  this  is  demonstrated  for  D3(/C2).  FIG.  2  portrays  sets  of  homomers 
which  model  C2 ,  C3  and  D2  point  groups. 
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FIGs  3  and  4  model  all  CR’s  of  D3  and  T  point  groups  respectively. 

S.Modelling  Marks  of  a  Given  G(/Gi)  Using  K  [G(/Gi)] 

Firstly  ,  one  lists  below  each  homomer  the  subgroups  which  leave  it  fixed  and  then 
arrange  the  results  in  the  form  of  a  row  vector  the  entries  of  which  are  the 
representative  ^  subgroups  of  G  ordered  in  a  nonascending  order  ,  viz.  , 
|Gi[  <  IG2I  <  ...  <  |Gs|.  In  FIGs.  2-4  rows  of  marks  from  each  CR  are  shown.  The 
above  non  ascending  sequence  is  called  sequence  of  subgroups ,  SSG. 
4.1nter-relations  Among  a  Set  of  Homomers 

In  order  to  further  our  knowledge  of  the  properties  of  a  set  K  [G(/Gi)] ,  the  inter¬ 
relations  among  the  individual  members  (homomers)  can  be  outlined  either 
graphically  or  in  matrix  form. 

4.1.  Graphical  Representation  of  K  :”Cayley  -  like  Colour  diagrams” 

the  set  of  r  homomers  {hi,h2,...,hr}=  ^  which  is  associated  with  a  given  CR ,  G(/Gi) 

can  be  represented  by  a  complete  digraph  (directed  graph)  on  r  vertices  with  arcs  and 
loops.  Each  arc  say  from  hi  to  hj  is  “coloured”  ,  so  to  speak ,  with  a  set  of  symmetry 

elements  ,  Sy  ,  such  that 


gij  s  Sij 

each  vertex  ,  vj  ,  is  associated  with  a  loop  which  is  also  coloured  with  a  set  of 
elements  ,  Sjj  so  that  a  given  gn  e  Sj;  stabilises  hj.  The  resulting  digraphs  , 

it)  [H  (G/Gi)]  are  reminiscent  of  “Cayley  colour  graphs”  of  groups  .  FIG.  5  shows 

graphs  of  CR’s  of  some  point  groups.  We  shall  demonstrate  that  the  graphs 

contain  information  on: 

a) Coset  decomposition  of  G  by  Gj ; 

b) Marks  of  G(/Gi)  ;  and 

c) G(/Gi)  4^  Gj  (c.f  eqn.3)  for  all  Gj  of  G; ;  where  Gj  is  a  subgroup  of  Gi. 

Furthermore  ,  the  above  information  can  be  extracted  almost  immediately  from  these 
digraphs. 
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4.2  Matrix  Representation  of  % 

\i\K  [G(/Gi)]|  is  large  (say  >4  )  it  may  be  more  convenient  to  work  with  in  its 
matrix  form  ,  M[G(/Gi)].  The  matrix  is  defined  as  an  r  x  r  matrix  whose  diagonal  and 
off-diagonal  elements  are  the  sets  Sj;  and  Sy  respectively  (c.f  Section  4.1).  FIG.  6 

shows  M  matrices  of  several  coset  representations  of  some  point  groups. 

4.3  Graphical  Representation  of  eqn(3) :  Subduced  Representations 
G(/Gi)  Gj  is  usually  expressed  as  a  sum  of  CR’s  of  Gj ,  viz. , 

G(/Gi)  4^  Gj  =  ajicGj(/Gk)  +  ajiGj(/Ge)  +...  -(7) 

where  Gk  ,  Gi,  ...  are  subgroups  of  Gj  and  the  a’ s  are  non-negative  multiplication 
factors. 

The  “subduction”  set  of  equivalent  configurations  described  above  may  be  obtained 
graphically  by  the  following  steps: 

a)  Find  the  Cayley-like  colour  graph  of  G(/Gi) ,  >^[G(/Gi)]  as  described  in  section 
4.1. 

b)  an  arc  (or  a  loop  )  in  ji([G(/Gi)]  is  annihilated  ( i.e.  pruned  out )  unless  one  of  its 

colour  components  belongs  to  Gj ,  the  subducing  group. 

c)  The  result  of  b)  is  in  general ,  a  set  of  disconnected  h'soi  the  general  form 

{  ^[Gj(/Gk)].3^[Gj(/G, )],..}  ^9^ 

where  ,  in  general  ,  each  component  of  the  above  set  may  be  repeated  a  times  [c.f 
eqn.(7)]. 

d)  the  result  of  subduction  is  obtained  by  comparing  the  resulting  diagrams  with  those 
shown  in  FIG.  5.  FIG  7  illustrates  such  a  graphical  subduction  of  the  CR  :  T(/C3)  by 
C2  and  by  C3. 

Alternatively  the  set  G(/Gi)  4^  Gj  might  be  obtained  by  subducing  the  matrix 
M[G(/Gi)]  in  the  following  steps; 

a)  Annihilate  from  M[G(/Gi)]  all  g  g  Gj  to  give  M[G(/Gi)  4^  Gj] 
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b)  If  M[G(/Gi)  Gj]  is  not  already  in  block  form  ,  then  apply  to  it  the  appropriate 
set  of  row/column  operations  to  transform  it  into  block  form.  (The  notation  Hy  (Ky) 
=  interchanging  rows  (columns)  i  and  j).  A  general  form  of  the  matrix  which  results  in 
this  step  is  given  by: 


(Gj(/Gt)] 

O 


O 

[Gj(/Gi] 


...(10) 


IGj(/GJ] 


where  Gk ,  Ge , ... ,  Gm  are  subgroups  of  Gj. 

c)  The  block  matrix ,  (10) ,  represents  the  disconnected  Cayley-like  colour  graphs; 

{  A[Gj(/Gi>] .  A[Gj(/Gi)l  A[Gj(/G„)] }  .•■(11) 

d)  Then  the  matrix  given  by  eqn(  10)  corresponds  to  the  subduction  expression  given 
by  eqn.(7). 

Example  1 

T(/C2)  =  D2(/C2)  +  D2(/C2’)  +  D2(/C2”)  +  ...  _  (13) 

We  observe  that  \ib  [T(/C2)]|  =  6  and  whence  it  is  more  convenient  to  work  with 


atrix  forms ,  steps  a)-d)  are  illustrated  below; 


{I.C2}  {Q’.Ca”}  {C3o,C3(3)} 

,Cz  }  {1,02}  {C3(2),C3(4i} 

{C’3(il.C  3(3)}  {C’3(2),C’3f4)}  {I,C2’} 

{C’3(2),C’3(4)}  {C’3(l),C’3(3)}  {C2,C2”} 

{C3(2),C3f3)}  {C3(i),C3(4)}  {C’3(3),C’3 

{C3(|),C3(4)}  {C3(2),C3(3)}  {C’3(i),C’3 


{C3(i),C3(3)}  {C3(2),C3(4)}  {C’3|2),C’ 

{C3(2),C3(4)}  {C3(i),C3(3)}  {C’3(i).C’ 

{fC2’}  {C2,C2”}  {C3(3),C3f4)} 

{C2,C2”}  {I,C2’}  {C3„),C3(2,} 

{C’3(3),C’3(4)}  {C’3(i),C’3(2)}  {I.C2”} 

{C’3(i),C’3,2)}  {C’3(3),C’3(4)}  {€2,02’} 


{C’3|2),C’3(3)}  {C’3{I),C’3(4,} 

{C’3(,).C’3(4)}  {C’3(2),C’3,3,} 


{C3(i),C3(2)} 

{C3(3),C3(4)} 

{1X2”} 


M[T(/C2)] 


Keep  only  g  e  Di 


{IC2} 

{C2’,C2- 

O 

O 

O 

O 


{C2’,C2”} 

{IX2} 

O 

o 

o 

o 


o 

o 

{IX2’} 

{C2,C2”} 

o 

o 


{C2,C2”} 

{I.C2’} 

o 

o 


o 

o 

o 

o 

{I.C2”} 

{C..C,’} 


o 

o 

o 

o 

{C2,C2'} 

{I.C2”; 


M  [T(/C2)  4^  D2] 
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[D2(/C2)1  O2  O2 

O2  ID2(/C’2)]  O2 

O2  O,  fD2(/C”2)] 

M[T(/C2)  ♦  D2] 

where  O2  is  a  2  x  2  null  matrix 
In  this  example  no  row  /column  operations  were  necessary  ,  i.e.  the  subduced  matrix 
was  already  in  block  form.  The  following  example  represents  a  more  general  situation 
where  one  must  apply  a  set  of  row/column  operation  to  obtain  the  expression  of  the 
subduced  representation; 


An  application  of  the  model  Find  the  isomer-count  matrix  of  the  following 

substituted  trishomocubane  graph. 


Trishomocubane  is  a  caged  compound  which  belongs  to  the  D3  point-group‘s 
[c.f  FIG  1].  The  compound  can  be  envisaged  from  the  fusion  of  six  equivalent 
cyclopentane  rings  (or  three  norbornanes).  The  parent  (unsubstituted  graph  of  this 
molecule  has  three  C2  axes  of  symmetry  each  passing  through  a  methylene  carbon  and 
the  center  of  an  opposite  edge;  One  C2  axis  passes  through  vertex  4  and  edge  joining 
vertices  1  and  8,  Another  C2  axes  passes  through  vertex  7  and  the  bond  joining 
vertices  3  and  10.  Finally  a  third  C2  passing  through  vertex  1 1  and  the  bond  between 
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vertices  5  and  6.  The  molecular  also  has  two  C3  axes,  each  passes  through  vertices  1 
and  9.  Inspection  of  the  graph  shown  in  eqn.  (15)  reveals  three  orbits,  viz., 

Ai  =  {  4,  7,  11}  ;  (colored  as  open  circles)  ...  (15) 

A  subset  of  bivalent  vertices. 

A2=  {  1,  3,  5,  6,  8,  10} ;  (colored  as  solid  circles)  ...  (16) 

A  subset  of  trivalent  vertices  each  one  is  adjacent  to  two  trivalent  vertices  and  one 
bivalent  vertex. 

A3={2,9};  (colored  as  open  triangles)  ...  (17) 

A  subset  of  trivalent  vertices,  each  vertex  of  which  is  surrounded  by  three  trivalent 
vertices. 

FIG.  1  and  3  show  how  to  model  coset  representations  of  D3  point  group  which  lead 
to  a  model  of  the  mark  table  of  this  group  in  FIG.  8.  The  conventional  (numerical) 
from  is  given  in  Table  1. 

Table  1 

Mark  table  of  D3  point-group 


Cl 

C2 

C3 

D3 

D3(/Ct) 

6 

0 

0 

0 

D3(/C2) 

3 

1 

0 

0 

D7(/C3) 

2 

0 

2 

0 

D3(/D3) 

1 

1 

1 

1 

Now  to  define  the  coset  representation  which  controls  the  orbit  of  substitution,  A2  in 
this  case,  eqn  (16),  we  apply  the  elements  of  symmetry  of  D3  to  the  vertices  of  A2  and 
count  the  number  of  fixed  vertices  under  the  effects  of  all  subgroups;  the  rows  of  the 
mark  table  of  the  full  group  (Table  1).  These  operations  are  shown  below; 
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I 


C2 


Cs 


C’ 


(1)(6)(5)(3)(10)(8) 
(18)(6  10)  (3  5) 
(3  10) (15) (68) 

(5  6)(1  10)  (38) 

(6  1  3)  (  5  8  10) 

(6  3  1)(5  10  8) 


Cl  C2  C3  D3 

^  ^  T 

V  V 

V 

V 

^j  ^| 

>/  V 


Cl  C2  C3  D3 

(6  0  0  0  ) 


...  (18) 

Comparison  of  the  vector  generated  in  eqn.  (18)  with  mark  table,  Table  1,  indicates 
that  the  orbit  which  controls  substitution  in  A2  is  D3  (/Ci)  type.  This  is  the  CR  which 
must  be  subduced  by  the  four  subgroups  of  D3.  The  resulting  USCI’s  are  outlined 
below  together  with  the  corresponding  generating  functions  adopting  the  weights; 


w(C) 

=  X  ; 

w(N)  =  y 

...  (19) 

i  Cl 

-  s:  ^ 

(x  +  y)^ 

...  (20) 

i  C2 

(x^  +  y^)' 

...  (21) 

i  C3 

^  s! 

(x'+/f 

...  (22) 

i  D3 

Se 

(x^  +  y^) 

...  (23) 

An  illustration  of  the  subduction  D3(/Ci)  >1  C2  =  3  C2(/Ci) 

The  Cayley  graph  of  the  CR  D3(/Ci)  has  six  vertices  which  represent  its  six  homomer 
models.This  graph  is  too  large  to  construct  and  then  apply  the  pruning  technique  to 
expand  the  required  subduction  .  In  this  and  similar  situations  the  matrix 
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representation  of  CR  is  a  more  convenient  method.  The  homomers  of  D3(/Ci)  shown 
in  FIG.  3  transform  to  one  another  according  to  the  following  matrix: 

hi  h2  ha  h4  hs  h^ 

I  >  (Ci }  {C2'  )  {C2”i  {C,  )  {C,’  A 

{C2  )  {  I  >  {Cj'  )  {C,  }  {C2”)  {C2’ } 

{C,’  }  {Cj  )  {  I  )  {c,- )  {Cj  }  {C2”} 

{C2”}  {Cj-}  {C,  )  {  I  )  (C2’}  {C2  ) 

{Cj’ }  (C2"}  {C2  )  {C2’  >  {  I  )  {C3  ) 

..{Cj  }  {Cl’  )  {C2")  (C2  )  {C3’  }  {  I  )  /  (24) 


hi 

ha 

ha 

M[D3(/Ci)]=h4 

h5 

hfi 


To  obtain  the  required  expression  for  subduction  (with  C2)  we  keep  only  those 
elements  of  C2(  =  {I,C2})  in  the  above  matrix  and  carry  out  the  appropriate 
row/column  operations  as  shown  below: 


h, 

ha 

ha 

h4 

hs 

h6 


hi 

I 

C2 

0 

0 

0 

0 


ha 

C2 

I 

0 

0 

0 

0 


ha 

0 

0 

I 

0 

C2 

0 


h4 

0 

0 

0 

I 

0 

C2 


hs 

0 

0 

C2 

0 

I 

0 


=  M[D3(/Ci)iC2] 

...  (25) 


I  C2  0  0  0  0 
C2  I  0  0  0  0 

0  0  I  C2  0  0 

0  0  C2  I  0  0 
0  0  0  0  I  C2 

0  0  0  0  C2  I 


02  02 

C2(/C,)  O2 

02  C2(/Ci)y 


...(26) 


where  Ky  =  interchange  column  i  and  column)  while  Hy  =  interchange  row  i  and  row 
j  and  O2  is  a  2  x2  null  matrix  .  Eqn.(26) ,  then  corresponds  to  three  C2(/C|)’s  which  is 
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what  one  obtains  using  conventional  coset  algebra  which  requires  both  permutation 
representations  and  mark  tables  of  the  subducing  groups. 

The  resulting  polynomials ,  eqns.(19)-(23),when  expanded  generate  the  following  FP 
matrix: 

Table  2 

Fixed-point  matrix  of  the  trishomocubane  problem. 


When  the  mark  table  is  applied  into  Table  2  we  obtain  the  desired  isomer-count 
matrix,  shown  below: 

Table  3 

Isomer-count  matrix  of  the  trishomocubane  graph. 
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Where  each  star  corresponds  to  a  row  from  the  mark  table.  The  labelings  of  the  stars 
correspond  to  the  heterocycles  derived  from  this  caged  molecule.  These  are  drawn  in 
FIG  9 

5.  Modeling  Algebraic  Properties  of  Mark  and  Group  -  Subduction  -Tables 
S.lProperties  of  Marks 

5.1.1  The  row  of  marks  of  G(/Gi)  has  the  general  form 

(|G|  0  0  ...  O)  .Ml) 

where  the  number  of  O’s  =|SSG|-1.  This  is  because  every  loop  in  iJ[G(/Gi)]  is 
coloured  Avith  just  one  component  ,  viz.  ,  Ci  and  recalling  that  |G(/Gi)|  =  |G1  = 
|G[G(/Gi)]|  the  general  form  of  row  of  marks  given  by  eqn.(24)  results. 


5.1.2  The  row  of  marks  of  G(/G)  has  the  general  form  : 

(1  1  ...  1) 

Where  the  number  of  1  ’s  =  |SSG|.  This  property  which  one  observes  at  the  bottoms  of 
mark  Tables  results  from  the  fact  that  j^[G(/G)]  has  the  general  form  of  a  single 


vertex  whose  loop  is  coloured  with  every  subgroup  of  G.  I.e. ,  takes  the  general  form: 

{Ci,G2 . G} 


...(29) 


5.1.3  The  row  of  marks  of  a  given  G(/Gi)  has  only  two  values  .  viz. , 


...(30) 


iff  all  the  loops  of  ij[G(/Gi)]  are  identically  coloured  with  the  same  subgroups. 
Examples  are  shown  in  FIG.  2  for  D2(/C2) ;  D2(/C’2)  and  D2(/C”2). 
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5.2  Properties  of  Subduced  Representations 

5.2.1  There  are  two  dominant  characters  of  the  general  sum  given  by  eqn.  (7)  namely  : 

a) The  number  of  homomers  which  represents  (models)  a  given  coset  representation  , 
G(/Gi)  ,  =the  number  of  homomers  which  models  a  given  subduced  representation  of 
G(/Gi)  by  one  of  its  subgroups’  i.e.: 

|X[G(/Gi)]|  =  |«(G(/Gi)4'Gj]|  •PI) 

b)  The  two  sets  of  homomers  of  eqn.(28)  have  identical  transformation  properties 
under  all  g  e  Gj  i.e.  under  all  symmetry  operations  of  the  subducing  groups. 

These  two  properties  may  be  modeled  by  considering  for  example: 

T(/C3)  |C3  =  C3(/Ci)  +  C3(/C3)  :C3={I.C3.C3’} 


'Invariant 
under  C3 


03(703) 

Invariant  under  0, 


O3(/O0 


=03(/03)+03(/0,)  •••(32) 


Property  a)  is  also  understood  using  graphical  subduction  of  the  Cayley  diagram 
since  the  total  number  of  vertices  in  the  fragmented  graph  is  preserved. 

5.2.2  Subduction  of  the  identity  representation  of  a  group  leads  to  the  identity 
representation  of  the  subducing  group ,  i.e. , 

G{/G)>l'Gi-Gj(/Gj)  ,,(33) 

This  result  is  understood  from  the  general  form  of  the  Cayley  diagram  of  G(/G) , 


being  a  single  vertex  the  loop  of  which  is  coloured  with  all  the  subgroups  of  G.  Then 
subduction  by  Gi  leaves  only  the  colour  component  Gi  to  give  Gi(/Gi) ,  while  in 
general  subduction  by  Gi  leads  to  a  vertex  whose  loop  is  coloured  by  all  the 
subgroups  of  Gi  which  corresponds  to  Gi(/Gi)  and  so  on.  This  property  is  modelled 
below  for  the  identity  representation  of  the  T  point  group. 
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{Cl} 

i  C3 

{C1.C3} 

—  ^ 

Ci(/co 

{C.|C2,C3,D2,T} 

Cal/Ca) 

{C1.C2} 

9 

C2(/C2) 

jC2 

T(/T)  1  D2 

{Ci.C3.C2-.C2”.D2} 

^  Y 

D2(/D2) 

5.2.3  G(/G,)'l'Gj  =  rGi(/G,)  ,„(35) 

Here  the  Cayley  graph  of  G(/Gi)  possesses  loops  which  are  coloured  by  Ci  only  and 
because  Ci  is  a  common  subgroup  of  all  subgroups ,  the  resulting  subduced  Cayley 
graphs  will  also  have  loops  which  are  coloured  by  Ci  only  and  therefore  they  will  be 

all  identically  regular  representations  of  the  subducing  group  ,  i.e.  Gj(/Gi).  And 
because  r  =  |G1  /  |Gj| ,  there  will  be  r  such  ,i)[Gj(/Gi)]. 

5.2.4  G(/G,)  4^  G,  =  |G|Gi(/Gi)  ...(36) 

Eqn(33)  is  understood  from  the  fact  that  the  Cayley  graph  i:/[G(/Gi)]  contains  jGj 

vertices  ,  the  loops  of  each  of  which  is  coloured  with  Gi  while  the  arcs  with  g  ^  Gi. 
Then  subduction  by  Gi  fragments  it  into  |G|  vertices  each  of  which  is  nothing  else 
but  Gi(/Gi).Eqns  (35)  and  (36)  are  modelled  below; 


where  in  the  last  eqn.  r  =  lG|/|Gi|  =  |G|. 
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5.2.5  G(/Gi)4'Gi  =  rGi(/Gi)  -PS) 

Iff  all  hi  G  ^[G(/Gi)]  remain  invariant  under  Gj. 

In  this  particular  case  the  loops  of  the  vertices  of  ife[G(/Gi)]  are  coloured  with  all  the 
subgroups  of  Gi  (because  by  assumption  all  r  homomers  of  }([G(/Gi)]  remain 


invariant  under  all  g  g  Gj)  and  whence  none  of  the  arcs  are  coloured  with  any  g  g  Gj. 
Then  subduction  by  Gj  generates  r  lonely  vertices  whose  loops  are  coloured  with  all 
subgroups  of  Gi  ,  i.e.  by  {SSGi}  which  is  nothing  else  but  r  copies  of  the  regular 
representation  of  Gi.  This  case  is  modelled  below: 


This  property  is  exemplified  in  FIG.  10 

if  *  * 


...(39) 


16 


6.Discussion  and  Conclusions 

Scheme  1  outlines  the  main  feature  of  this  work: 

Point  Group  G 

Sequence  of  representative  subgroups  :{Gi,  G2 , ... ,  Gs};  Gs=G 
Set  of  Cosets  :  {G(/Gi) ,  G(/G2) , ... ,  G(/G)} 

Molecular  graph  which  remains  invariant  under  all  g  e  G 
-  Model  of  the  regular  representation ,  G(/G) 

4^ 

Colouring  of  the  model  of  G(/G)  so  that  it  remains  fixed 
only  under  g  g  Gi ;  1  <  i  <  s 

4^ 

A  set  of  equivalent  (coloured)  configurations  (homomers) 
for  each  G(/Gi)  ~  j^[G(/Gi)]. 

T  4^  V 

Mark  table  4-  Cayley  colour  diagram  for  each  G(/Gi)  Subduction  table 
Scheme  1 

Abstract  description  of  a  physical  phenomenon  remains  a  most  precise  and  exact 
description  while  a  given  model  is  always  going  to  be  approximate.  C.f  ,  the  three 
(popular)  physical  chemistry  models  of  an  ideal  gas  ,  ideal  electrolyte  and  ideal 
solution.  However  modelling  usually  carries  both  educational  as  well  as  theoretical 
endeavours.  In  the  present  work  several  group  theoretical  properties  are  rederived  or 
became  more  easily  envisagable  through  modelling.  Namely ,  eqn.  3  which  defines 
subductions  of  a  given  CR  and  the  concept  of  a  mark  :  FIG.7  is  a  pictorial  illustration 
of  eqn.  3.  FIG.  8  portrays  a  certainly  more  appealing  form  of  the  mark  table  (of  D3  as 
an  example  ;  c.f  Table  1).  We  believe  that  the  preparation  of  Cayley  diagrams  of 
CR’s  of  the  point  groups  of  chemical  interest  is  a  worthy  task  and  that  such  an 
appendix  in  a  text  which  deals  with  enumeration  of  chemical  structures  is  at  least  as 
important  as  appendices  which  contain  mark  tables  ,  subduction  tables  and  their 
“predecessors”  ;  Character  tables  ! 
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In  conclusion  ,  the  model  presented  here  translates  the  three  basic  (and  abstract) 
alphabets  used  in  the  “enumeration  journey”  of  Fujita,  viz.,  coset ,  representation , 
mark  and  subduction  of  a  group  into  the  language  of  graphs.The  latter  shall  always 
remain  the  more  appealing  for  chemists, Indeed  the  present  work  extends  an  invitation 
to  organic  chemists  who  would  like  to  see  organic  chemistry  from  a  theoretical 
(computational)  vent  but  who  are  also  repelled  by  the  (sometimes)  offensive  algebra 
in  their  way.  It  may  be  convenient  to  end  this  paper  by  a  parody  of  famous  Greek  mj^h 
(Rex  Warner ,  “Men  and  Gods”  ,  Kenkyusha ,  Tokyo  1959)*^. 

“An  organic  chemist  demanded  to  know  the  riddle  and  the  sphinx  said  ;  “What  is  it 
that  controls  elements  in  a  group,  controls  atoms  in  a  compound  and  finally  isomers  in 
organic  chemistry  ?”  “Is  it  a  coset  representation  or  a  mark  ?  replied  the  organic 
chemist.The  sphinx  found  that  her  riddle  was  at  last  answered  and  died  as  was  fated. 
The  organic  chemist  received  his  award  and  he  was  made  King  of  the  heaven! 
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Appendix  1 

One  -  to  one  Correspondence  between  cosets  of  a  given  representation  of  a  group  and 
the  corresponding  set  of  homomers  illustrated  for  D3(/C2).  The  permutations  resulting 
from  the  effect  of  symmetry  elements  are  outlined  below: 

D3(/C2)  =  C2C2  +  C2C3  +  C2C’3 


1 

2 

3 

Cl 

C2 

C3 

C4 

I 

1 

2 

3 

(1)(2)(3) 

V 

V 

V 

>/ 

C2 

1 

3 

2 

(1)(23) 

V 

V 

C2’ 

3 

2 

1 

(13)(2) 

V 

C2” 

2 

1 

3 

(12)(3) 

V 

C3 

2 

3 

1 

(123) 

V 

V 

C3’ 

3 

1 

2 

(132) 

V 

v 

->  row  of  Marks  :  (3  1  0  0) 

The  following  correspondences  are  observed  :  (c.f  FIG.3) 

C2C2={I,C2}~h,  ...(A-1) 

C2C3={C3,C2”}~h2  ...(A-2) 

C2C’3  =  {C3’,C2’}~h3  ...(A-3) 

The  elements  of  symmetry  of  D3  generate  identical  cyclic  structures  when  operate  on 

the  homomers  hi  -  hs 


hi 

h2 

h3 

I 

hi 

h2 

h3 

C2 

hi 

hs 

h2 

(1)(23) 

C’2 

h3 

h2 

hi 

(2)(13) 

C”2 

h2 

hi 

h3 

(3)(12) 

C3 

h2 

h3 

hi 

(123) 

C’3 

h3 

hi 

h2 

(132) 
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Fig.l)  Model  graphs  of  five  point  groups.  Some  elements  of  symmetry  are  shov^ 
The  (open)  vertices  represent  orbits  of  substitution.  The  indicated  graphs  model 
the  identity  coset  representations ,  G{/G)  of  point  groups. 

Fig.2)  Sets  of  homomers  (equivalent  configurations)  which  model  coset 
representations  of  C2  ,  C3  and  D2  point  groups.  The  indicated  row  vectors  are 
mark  rowa  corresponding  to  each  coset  representation. 

Fig.3)  Modelling  coset  representations  of  D3  point  group.  The  coloured  graphs  in 
braces  are  homomers  which  generate  the  indicated  mark  rows. 

Fig.4)  Coset  representation  of  T  point  group  modelled  by  the  appropriate  set  of 
homomers  along  the  corresponding  mark  rows.  In  all  cases  the  set  of  homomers 
which  models  G(/Gi)  remains  fixed  only  under  a  symmetry  elementsof  Gi. 

Fig.  5)  Cayley  colour  diagrams  >?J[5((G/Gi)]  which  represent  coset  representations  of 
several  point  groups.  Both  arcs  and  loops  are  coloured  by  appropriate  (sub-) 
groups.  The  number  of  vertices  in  each  graph  =  the  number  of  homomers  which 
model  the  corresponding  coset  representation.  Mark  rows  are  indicated.  Observe 
that  when  Gi  =  Ci ,  .i)  is  a  single  vertex  the  loop  of  which  is  coloured  with  all 

subgroups  of  G.  When  Gi  =  G  the  size  of  b  =|G1. 

Fig.6)  Matrix  representations  of  coset  representations  of  several  point-groups  ;  c.f. 
section  4.1. 

Fig.7)  Graphical  modelling  of  eqn.(3)  illustrated  for  T(/C3)  C3  ;  T(/C3)  4^  C2 
through  the  use  of  the  Cayley  colour  diagram  of  T(/C3). 

Fig.8)  A  “graphical  form”  of  the  mark  table  of  D3.  The  mark  corresponds  to  a  given 
subgroup  is  the  number  of  homomers  drawn  under  this  subgroup  ,  where  (j)  is  an 
empty  set.  Observe  that  hi  e  X[G(/Gi)]  remains  invariant  under  any  g  e  Gi  ;  g 
being  a  symmetry  element.  The  graphical  form  of  mark  table  makes  the 
properties  of  marks  more  visible. 

Fig.9)  Heterocyclic  derivatives  derived  from  trishomocubane  which  corresponds  to 
Table  3  (the  isomer-count  matrix). 
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Abstract 

Two  definitions  of  the  problem  of  graph  drawing  are  considered 
and  an  analytical  solution  is  provided  for  each  of  them.  The  solu¬ 
tions  obtained  mcike  use  of  the  eigenvectors  of  the  Laplacian  matrix 
of  a  related  structure.  The  procedures  ^ve  good  results  for  symmet¬ 
rical  graphs  and  they  have  already  been  used  for  drawing  Fullerene 
molecules  in  the  literature.  The  analysis  characterises  precisely  what 
problems  the  two  procedures  aue  solving.  It  also  illuminates  why  they 
can  perform  unsatisfactorily  on  asymmetrical  graphs. 


1  Introduction 

We  consider  the  problem  of  embedding  a  graph  on  n  vertices  in  Euclidean 
space  for  k  <  n.  Typically  k  would  be  3  or  2.  By  posing  the  problem 
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as  minimising  the  squared  norm  of  the  appropriately  weighted  distance  be¬ 
tween  adjacent  points  subject  to  natural  normalising  conditions  we  arrive 
at  a  formulation  of  the  problem  for  which  the  optimal  solution  can  be  sim¬ 
ply  computed  in  terms  of  the  eigenvectors  of  the  Laplacian  matrbc  of  the 
(weighted)  graph.  For  the  case  where  the  weights  are  chosen  to  be  unity 
the  solution  is  independent  of  the  uniform  penalty  given  to  non-adjacent 
vertices.  In  this  case  and  for  regular  graphs  the  technique  has  been  applied 
by  Pisanski  [9],  who  demonstrated  that  the  generated  drawings  are  partic¬ 
ularly  pleasing  in  the  case  of  Fullerene  graphs  arising  in  chemistry.  The 
idea  of  using  eigenvectors  for  drawing  graphs  was  used  first  in  chemical  set¬ 
ting  for  molecular  orbitaJs;  see  [8].  A  similar  technique  has  been  developed 
by  Bolla  [2]  for  generating  Euclidean  representations  of  hypergraphs.  For 
distance-regular  graphs  with  a  second  eigenvalue  of  multiplicity  at  leaist  k 
the  embedding  has  interesting  properties;  see  Godsil  [4]. 

This  paper  demonstrates  that  a  problem,  that  has  been  traditionally  solved 
by  gradient  descent  techniques  used  to  minimise  a  measure  of  poverty  of 
the  generated  embedding,  affords  an  analytical  solution  which  can  be  im¬ 
plemented  in  an  efficient  deterministic  algorithm  [9].  At  the  same  time  it 
reveals  significant  insights  into  the  relations  between  embeddings  of  graphs 
and  the  structure  of  the  eigenspaces  of  their  Laplacian  matrices. 

The  Laplacian  matrix  has  been  used  in  graph  embedding  before  in  Tutte’s 
straight  line  embedding  of  planar  graphs  (10,  11).  The  approach  presented 
here  is  related  but  corresponds  to  solving  the  equation  without  boundary 
conditions.  The  characterisation  in  terms  of  minimising  the  sum  of  distances 
between  vertices  is  also  appropriate  in  Tutte’s  case  but  subject  to  the  chosen 
cycle  being  fixed  at  the  boundary,  see  also  Becker  and  Hotz  [1]. 

Use  of  eigenvectors  to  generate  embeddings  is  not  new.  As  early  as  1980 
Kruskal  and  Seery  [6]  devised  a  sophisticated  method  for  drawing  net¬ 
work  diagrams  using  a  statistical  technique  called  Multidimensional  Scaling 
(MDS)  [5,  7]  to  arrive  at  a  matrix  whose  eigenvectors  could  be  viewed  as 
embedding  vectors.  The  approach  is  closely  related  to  that  presented  here, 
but  is  not  characterised  in  terms  of  a  tightly  defined  optimization  problem. 
In  Section  4  we  discuss  in  detail  the  relationship  between  their  method  and 
one  of  our  techniques.  It  transpires  that  in  certain  special  cases  the  solutions 
obtained  by  the  two  methods  are  up  to  scaling  factors  identical.  The  main 
advantage  of  our  approach  is  the  theoreticaJ  explanation  in  terms  of  the  two 
optimization  problems  which  elucidates  the  strengths  and  weaknesses  of  the 
two  methods. 
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2  Notation  and  Known  Results 


Let  A{G)  =  (Am,)  be  the  adjacency  matrbc  of  a  simple  (positively  weighted) 
n- vertex  graph  G  with  no  loops.  Note  that  u,  v  are  imderstood  to  be  adjacent 
iflf  >  0.  For  non-adjacent  vertices  Auv  =  0.  Let  D  be  the  n  x  n  diagonal 
matrix  with  non  zero  entries 

Dyi)  —  ^  ]  Auu) 

the  weighted  degree  of  vertex  v.  The  Laplacian  matrix  is  defined  to  be 
Q{G)  =  Q{A)  =D-A,  where  A  =  A{G). 

We  summarise  a  few  known  results  involving  the  Laplacian  matrbc.  We  will 
number  the  eigenvalues  of  Q{G)  given  in  ascending  order;  0  =  Ai  <  A2  < 
-  •  •  <  A„,  with  corresponding  eigenvectors  j  =  e^, e^, . . . ,  e",  where  j  is  the 
all  one  vector,  while  0  <  A2  if  the  graph  is  connected.  In  addition  for  cuiy 
n-dimensional  real  vector  x  it  can  be  verified  that 

x'^Q{G)x  =  -  ^  Auv(xu  -  Xyf-  (1) 

(u,v)eB(G) 


3  Graph  Drawing  Problem  and  Initial  Result 

We  pose  the  problem  of  embedding  a  graph  G  as  finding  a  mapping 

T  :  V(G)  — > 

We  will  place  constraints  on  this  mapping  in  order  to  ensure  that  the  rep¬ 
resentation  is  natiural  and  hopefully  pleasing.  We  will  denote  by  r,-  the 
n-dimensional  vector  formed  by  taking  the  i-th  coordinate  of  t(u)  for  all 
u  G  V(G).  Thus  Tj  is  an  n-dimensional  vector  indexed  by  the  vertices  of 
the  graph  G.  Our  first  requirement  is  that  the  centre  of  gravity  of  the  rep¬ 
resentation  be  at  the  origin.  This  implies  that  the  vectors  rj  have  average 
entry  0,  or  r,  ±  j,  for  t  =  1, . . . ,  k.  The  next  constraint  is  that  the  scaling 
in  dl  dimensions  be  similar.  This  is  ensured  by  requiring  that 

IkilP  =  £'r(«)i  =  1- 

U=1 

Note  that  throughout  this  paper  the  norm  notation  ||.||  will  as  here  refer 
to  the  2-norm.  Finally  we  would  like  the  embedding  to  retain  maximum 
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information  about  the  graph.  An  example  of  how  information  can  be  lost  is 
given  when  =  ry  for  some  i  ^  j,  i.e.  Tj  and  ry  are  maximally  correlated. 
In  this  case  we  have  eSiectively  reduced  the  dimension  of  the  representation 
by  one.  Hence  maximal  information  will  be  represented  if  the  vectors  have 
zero  correlation,  i.e.  Tj  J_  ry,  for  i  ^  j.  We  require  adjacent  vertices  to  be 
close  together  weighted  according  to  A^v  (e.g.  for  different  chemical  bond 
types  the  value  might  vary),  and  require  non-adjacent  vertices  to  be  far 
apart.  Oru:  definition  of  the  graph  drawing  problem  may  therefore  be  stated 
as  follows. 

Problem  3.1  Graph  Drawing  of  an  n-vertex  graph  G  given  by  (weighted) 
adjacency  matrix  A  in  TZ^,  k  <  n. 

Find  a  mapping  t  :  V{G)  — >•  71*,  which  minimises  the  following  energy 
function 

E{t)=  Y.  ^r^vMu)  -  r{v)f  -  P  Y  lk(«) -'r(v)ll^i 

(v.,v)^E{G)  {u,v)^E{G) 

subject  to  the  constraints 

|lr<||  =  l,  Ti±j,  fort  =  l,...,fc 
Tj  ±  Tj,  for  1  <  i  <  j  <  k, 

where  /3  is  a  positive  constant  controlling  the  strength  of  the  force  driving 
non-adjacent  vertices  apart.  ■ 

Before  proceeding,  some  further  discussion  of  our  problem  definition  is  war¬ 
ranted.  Firstly,  there  seems  to  be  some  arbitrariness  in  the  fact  that  we  can 
specify  different  ‘attractions’  between  vertices  but  non-adjacent  vertices  are 
all  ‘repelled’  with  equal  force.  We  will  show  that  the  more  general  problem 
created  by  allowing  negative  weights  can  also  be  solved  using  the  techniques 
derived  for  Problem  3.1. 

Another  aspect  of  the  definition  that  is  a  little  unsatisfactory  is  the  require¬ 
ment  that  the  scaling  be  similar  in  all  directions.  Indeed  we  will  see  that  the 
method  does  not  work  well  for  highly  asymmetrical  graphs.  In  order  to  avoid 
this  artificial  symmetrisation  we  propose  the  following  second  definition  of 
the  graph  drawing  problem  albeit  with  a  similar  flavour  to  Problem  3.1. 

Problem  3.2  Graph  Drawing  of  an  n-vertex  graph  G  given  by  (weighted) 
adjacency  matrix  A  in  T& ,  k  <  n. 
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Find  a  mapping  r  :  V[G)  ->  71^,  such  that  the  function 

^("^0=  S  Kv\\r{u)i  -  T{v)i\^  -  P  Yj  =  1> 

(u,t;)€E(G)  {■a,v)iE{G) 

for  t  =  1, . . . ,  fc,  while  maximising  the  sum  of  the  norms 

i=l  «6V(G) 

subject  to  the  constraints 

Tj  ±  j,  for  z  =  1, . . . ,  fc 
Tj  ±  Tj,  k)V  1  <i  <  j  <k, 

where  is  a  positive  constant  controlling  the  strength  of  the  force  driving 
non-adjacent  vertices  apart.  ■ 


Note  that  this  model  allows  the  Tj  to  have  different  norms,  but  specifies 
that  a  unit  length  of  ‘wire’  is  available  in  each  dimension  to  create  the 
model.  Clearly  changing  the  amount  of  wire  simply  has  a  scaling  effect  on 
the  solution,  so  that  the  problem  is  well-posed  if  the  number  1  is  replaced  by 
any  constant.  Note  also  that  the  requirement  also  implies  that  the  amount  of 
‘wire’  used  is  the  same  for  all  directions  since  the  norms  are  sums  of  squcires 
over  the  coordinates.  This  observation  lends  the  definition  a  naturedness 
that  matches  the  definition  of  Problem  3.1. 

We  are  now  in  a  position  to  state  our  main  result. 


Theorem  3.1  Let  G  be  a  connected  n-vertex  weighted  graph  with  adjacency 
matrix  A.  The  graph  drawing  problem  given  in  Problem  3.1  is  solved  by 
taking  the  weighted  graph  with  adjacency  matrix  B  with  entries 


B 


UV  — 


+  P)  if  {u,  v)  e  E{G) 
0  otherwise 


and  computing  the  eigenvectors  e^e^,...,e’^  with  corresponding  eigenval¬ 
ues  0  =  Ai  <  A2  <  ...  <  A„  of  the  Laplacian  matrix  Q{B).  An  optimal 
embedding  r  is  given  by  Ti  =  i  =  l,...,fc  and  the  minimal  value  of 
E{t)  is 

fc+i 

1=2 
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V  ^k+i  <  ^k+2  then  the  optimal  embedding  is  unique  up  to  orthogonal  trans¬ 
formations  in  Tl^. 

Corollary  3.1  In  the  case  where  the  graph  is  not  weighted  (i.e.  A^v  6 
{0, 1}^,  the  optimal  embedding  does  not  depend  on  the  parameter  /3. 

Proof:  If  the  graph  is  not  weighted  and  has  adjacency  matrix  A,  then 
5  =  (1  +  P)A.  Hence  the  Laplacian  matrices  Q{A)  and  Q{B)  also  satisfy 
Q{B)  =  (14-/3)Q(A).  This  implies  that  they  have  the  same  eigenvectors  with 
the  corresponding  eigenvalues  of  Q{B)  multiplied  by  a  factor  of  1 +/3.  Hence 
by  the  theorem  the  optimal  embedding  does  not  depend  on  the  parameter 

Proof  of  Theorem  3.1  First  note  that  we  can  rewrite  the  energy  function 
E{t)  as  follows. 

E  (A„+p)Mu)-r{v)f-P  E  Il’-W-’-Wf.  (2) 

(<.,V)£E(C)  («,«)eE(K„) 

where  Kn  is  the  complete  graph  on  the  vertices  of  G  with  edges  weighted  1. 
If  we  consider  the  complete  graph  in  equation  (1),  the  following  equality  is 
obtained  for  an  n  dimensional  real  vector  x. 

x'^Q{Kn)x  =  x^{nl  —  J)x  =  ^  (jCu  —  a;„)^  (3) 

n,v^V[Kn) 

where  I  is  the  n  y.  n  identity  matrix  and  J  is  the  n  x  n  all  I’s  matrix, 
i.e.  Jij  =  1,  for  all  i,j.  In  general  we  have  the  following  relation  for  an 
embedding  r  and  graph  G  with  adjacency  matrix  A  and  its  Laplacian  matrix 
Q{A). 

MeEiG) 

(u,i;)eE(G)  i=l 

=  Z  Z  ^uv{r{u)i  -  r(u),)^ 

i=l  (u,v)6E(G) 

=  Yl'rjQ{A)Tu  (4) 

i=l 
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by  equation  (1).  Combining  the  results  of  equations  (2),  (3)  £uid  (4),  we 
obtain  the  following  expression  for  the  energy  fimction  E{t). 

Et.T)  =  j^TT[Q(B)-e(nI-J)]Ti  (5) 

i=l 

Let  j  =  be  the  eigenvectors  of  Q{B)  with  corresponding  eigen¬ 

values  0  =  Ai  <  Aa  <  . . .  <  A„.  Without  loss  of  generality  we  may  take 
||e*||  =  1  for  I  >  1,  since  eigenvectors  are  only  determined  up  to  their  direc¬ 
tion.  Note  that  eigenvectors  of  a  symmetric  matrix  are  orthogonal  ajid  so 
e*  X  for  i  ^  j.  We  have 

[g(B)-/3(n/-J)]e'  =  0, 

while  for  i  >  1,  e*  X  j  and  so 

[Q{B)  -  PinI  -  J)]e‘  =  (Ai  -  /3n)eL 

Hence  the  eigenvectors  of  Q(B)  are  also  eigenvectors  of  Q(B)  —  P(nl  —  J). 
Expressing  in  the  eigenbasis,  we  have 

1=1 

where  =  0  since  X  j  {j  =  e^).  Hence  we  can  write  the  energy  of  r  as 

t=i  t=i 

=  -Pnk. 

1=1  i=l 

The  condition  Tj  X  rj  now  becomes  /Zj  X  fj,j,  while  the  condition  |lTi||  =  1 
becomes  ||//i||  =  1.  Since  the  /i,-  can  be  extended  to  an  orthonormal  basis 
matrix  M  for  which  is  also  orthonormal  we  have 

■'I  =  <  1 

t=l 

with  I'l  =  k.  Hence,  the  minimum  will  occur  when  =  1  for  £  = 
2, . . . ,  fc  +  1  and  u]  =  0,  for  i  >  k  +  1.  This  can  be  achieved  by  taking 
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—  1)  f^i  —  0,  j  ^  i  +  1,  or  Tj  =  i  =  1,...,/:,  as  stated  in  the 
theorem.  Note  that  the  minimum  energy  is 

*+1 

^  A/  -  Pnk. 
e=2 

If  Afc+2  >  Aa:+i,  then  we  must  have  i/|  =  0  for  £  >  fc+l  for  the  minimum  to  be 
achieved.  This  implies  that  . . . ,  spaji  the  same  space  eis  e^, . . . , 
and  can  be  obtained  by  an  orthogonal  transformation  of  these  vectors.  Hence 
the  optimal  embedding  is  unique  up  to  orthogonal  transformation  in  7^*^.  ■ 


4  Applications  and  Further  Results 


We  begin  by  addressing  the  problem  touched  on  in  the  introduction  con¬ 
cerning  the  possibility  of  solving  a  problem  for  which  the  underlying  graph 
has  negative  weights. 


Theorem  4.1  Let  G  be  a  connected  n-vertex  weighted  graph  with  some  neg¬ 
ative  weights  and  adjacency  matrix  A.  The  graph  drawing  problem  given  in 
Problem  3.1  is  solved  by  taking  the  weighted  graph  with  adjacency  matrix  B 
with  off-diagonal  entries 


{ 


{A^v  +  a  -f  /?) 
a 


if{u,v)eBiG) 

otherwise 


where  a  =  —  min{dLu„|(u,u)  G  B{G)}  >  0,  and  computing  the  eigenvectors 
e^e^,...,e"  with  corresponding  eigenvalues  0  =  Ai  <  A2  <  ...  <  An  o/ 
the  Laplacian  matrix  Q{B).  An  optimal  embedding  r  is  given  by  r,-  =  e*+^, 
i=  and  the  minimal  value  of  E{t)  is 


k+l 

i=2 

If  Xk+i  <  Ait+2  then  the  optimal  embedding  is  unique  up  to  orthogonal  trans¬ 
formations  in  “R^. 


Proof:  The  theorem  follows  from  Theorem  3.1  and  the  observation  that 
EA+a{J-I)  (t)  =  Ea  (t)  -1-  ak, 
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which  follows  from  the  computations  performed  in  the  proof  of  Theorem  3.1. 
Hence,  a  minimum  embedding  for  A  is  also  a  minimum  embedding  for  A  + 
a{J  —  I),  while  the  minimal  value  of  E{t)  is  ak  less.  ■ 

Hence  the  procedure  can  also  be  used  to  find  optimal  embeddings  of  graphs 
with  negative  weights  as  might  occiu  in  chemical  bonds  with  difi'erent  re¬ 
pelling  strengths. 

A  question  which  might  natiually  arise  when  considering  a  novel  embedding 
strategy  is  whether  it  is  guaranteed  to  produce  a  2-dimensional  drawing  with 
no  crossing  edges  when  presented  with  a  planar  graph.  For  the  algorithm  of 
Theorem  3.1,  this  turns  out  not  to  be  the  case  as  the  simple  counter-example 
in  Figure  1  shows. 


Figure  1:  Planar  graph  drawn  with  a  crossing  edge 


The  graph  is  Cr  (the  cycle  on  7  vertices)  with  two  extra  edges,  (1,5)  and 
(3,7).  The  graph  is  clearly  planar,  but  Figure  1  shows  the  result  of  applying 
the  algorithm  of  Theorem  3.1. 

A  good  example  of  the  kind  of  image  generated  by  our  method  is  given  in 
Figure  2  which  is  the  embedding  generated  for  the  Buckminster  fullerene  in 
TZ^  using  the  2nd,  3rd  and  4th  eigenvectors  and  taking  a  two  dimensional 
projection. 
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Figure  2;  The  Buckminster  fullerene.  The  coordinates  are  determined  by 
the  2nd,  3rd  and  4th  eigenvector. 


In  our  definition  of  the  graph  drawing  task  (see  Problem  3.1),  we  require  that 
the  drawing  has  normalised  variance  along  the  coordinate  axes  and  that  the 
projections  onto  the  coordinates  are  orthogonal.  Together  these  constraints 
imply  that  the  drawing  will  have  spherical  symmetry  in  terms  of  its  variance 
along  any  axis,  since  along  a  (normalised)  direction  y  =  (yi,...,yfc)  the 
variance  is 


i=l  i=l 

k 

=  =  i 

i=l 

Hence,  in  a  certain  sense  we  are  forcing  the  graph  to  “look  spherical”.  For 
graphs  with  a  naturally  eccentric  shape  our  method  can  break  down.  In 
order  to  illustrate  this  effect.  Figure  3  shows  how  the  method  draws  the 
Cartesian  product  of  two  paths  Pn  x  P^,  2  <  n  <  m  <  10  in  P?.  The  rows 
of  the  figure  are  indexed  by  n,  while  the  columns  are  indexed  by  m  —  n. 
Hence  the  leftmost  column  contains  drawings  of  P^  x  for  n  =  2, . . . ,  10, 
while  the  top  row  contains  the  drawings  of  P2  x  P^,  for  m  =  2, . . . ,  10. 


Figure  3:  The  Cartesian  product  of  two  paths  Pn  x  P„j,2  <  n  <  m  <  10, 
where  the  coordinates  are  given  by  the  second  and  third  eigenvector  of  the 
Laplacian  matrix. 


Note  that  the  figures  become  degenerate  when  the  difference  between  m 
and  n  is  too  large  and  both  the  second  and  third  eigenvalues  are  inherited 
from  Pm,  causing  each  copy  of  Pn  to  map  to  a  point.  The  method  fails 
to  work  because  the  second  harmonic  in  the  longer  direction  corresponds 
to  a  lower  Laplacian  eigenvalue  than  the  first  harmonic  in  the  orthogonal 
direction.  If  equality  of  these  two  eigenvalues  occurs  then  a  mixture  of 
the  two  ‘modes’  appears  in  one  coordinate,  otherwise  the  second  coordinate 
becomes  a  quadratic  function  of  the  first  eind  the  graph  drawing  collapses 
onto  a  line. 

In  order  to  show  that  this  problem  is  not  only  confined  to  simple  ‘two- 
dimensional’  graphs,  we  include  a  fullerene  graph  drawn  using  our  technique, 
which  also  possesses  a  degenerate  image  (see  Figure  4).  The  graph  shown 
is  taken  from  [8].  Although  not  immediately  apparent  from  the  figure  the 
three-dimensionad  coordinates  of  the  vertices  all  lie  on  a  parabolic  (two- 
dimensional)  surface,  though  in  this  case  no  pair  of  vertices  is  actually  given 
the  same  coordinates.  This  explains  why  in  this  case  a  better  image  is 
created  by  taking  the  2nd,  4th  and  5th  eigenvectors,  [8,  9],  since  the  third 
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eigenvector  is  a  harmonic  of  the  second. 


Figure  4:  A  fullerene  on  60  vertices.  The  coordinates  are  determined  by  the 
2nd,  3rd  and  4th  eigenvector. 


We  conclude  this  section  by  presenting  a  solution  to  the  second  Graph  Draw¬ 
ing  Problem  3.2,  which  to  a  certain  extent  overcomes  the  enforced  symmetry 
implicit  in  Problem  3.1. 


Theorem  4.2  Let  G  be  a  connected  n-vertex  weighted  graph  with  adjacency 
matrix  A.  The  graph  drawing  problem  given  in  Problem  3.2  is  solved  by 
taking  the  weighted  graph  with  adjacency  matrix  B  with  entries 


{■^uv  +  P)  if  {u,  v)  E  E(G) 
0  otherwise 


and  computing  the  eigenvectors  e^,e^,  ...,e”  with  corresponding  eigenval¬ 
ues  0  =  Ai  <  As  <  ...  <  A„  of  the  Laplacian  matrix  Q{B).  An  optimal 
embedding  r  is  given  by 


Ti  = 


v'Aj+i  -  /5n 


,  i  =  1, . . . ,  fc. 


Proof:  Using  the  analysis  of  Theorem  3.1  we  can  write 

t-2 


12 


where 


i=l  <=2t=l 

By  the  observation  after  the  definition  of  Problem  3.2,  the  solution  will  be 
invariant  to  orthogonal  rotations.  We  may  therefore  assume  that  the  rj  are 
eJigned  with  the  eigenvector?  e*'^^  when  projected  into  the  subspace  spanned 
by  e^, . . . ,  6*^+^  Hence,  /i{  =  0,  for  j  <  k  +  1.  It  is  now  clear  that 

Ehif, 


is  maximised  by  also  taking  =  0,  for  j  >  k  +  I,  since  the  norm  will 
have  to  be  smaller  in  order  to  have  B(ri)  =  1,  after  multiplying  by  larger 
eigenvalues.  Hence,  the  optimal  solution  is  given  by  taking  Tj  =  Cie*+^.  But 
then 

^(n)  =  (A,+i  -  /3n)cf  =  1, 

_  1 

V'Ai+i  -y9n’ 

as  required.  ■ 


The  algorithm  that  is  proposed  in  Theorem  4.2  has  already  been  adopted 
by  Manopoulos  and  Fowler  [8]  with  improved  results  for  less  symmetrical 
graphs  than  the  algorithm  of  Theorem  3.1. 

If  we  apply  the  MDS  method  of  Kruskal  and  Seery  [6]  to  a  graph  which  is 
vertex  transitive  (i.e.  has  a  group  of  automorphisms  which  acts  transitively 
on  the  vertices,  ensming  the  graph  ‘looks  the  same’  from  the  viewpoint  of 
any  vertex),  the  result  will  be  that  Tj  is  set  to  a  different  but  related  multiple 
of  This  follows  from  the  fact  that  the  MDS  method  acts  as  a  uniform 
procedure  in  this  case  together  with  the  fact  that  the  eigenvectors  of  the 
adjacency  matrix  of  a  regular  graph  coincide  with  those  of  its  Laplacian 
matrix. 
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5  Conclusions 


We  have  presented  an  analysis  characterising  two  graph  drawing  procedures 
that  have  been  adopted  by  different  researchers,  principally  for  drawing 
Fullerene  molecules.  The  characterisation  is  pleasing  in  itself,  but  also 
throws  light  on  the  performance  of  the  procediures  and  in  particular  clarifies 
when  they  cure  likely  to  perform  well. 

It  is  not  clear  how  the  results  might  be  generalised  if  the  norms  used  axe 
altered,  either  in  the  energy  function  or  in  the  accompanying  constraints  on 
the  vectors  r,-.  It  is  likely  that  an  analytical  solution  will  not  be  possible  in 
this  case. 

A  question  that  remains  unresolved  in  our  understanding  of  the  application 
of  these  methods,  is  a  satisfactory  way  of  determining  when  the  eigenvec¬ 
tors  for  the  smallest  eigenvalues  are  harmonics  of  those  already  used  and 
should  therefore  be  discarded.  Those  using  the  methods  have  derived  var¬ 
ious  heuristics  but  it  would  be  useful  to  gain  greater  understanding  of  the 
fcictors  involved. 
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We  propose  characterizing  of  the  cyclicity  of  molecular  graphs  by 
considering  their  D/DD  matrix.  Each  non-diagonal  element  of  D/DD  is  a 
quotient  of  the  corresponding  elements  of  the  distance  matrix  D  and  the 
detour  matrix  DD  of  a  graph.  In  particular,  we  are  using  the  leading 
eigenvalue  of  the  D/DD  matrix  as  a  descriptor  of  cyclicity  and  are 
investigating  for  monocyclic  graphs  Cn  how  this  eigenvalue  depends  on  the 
number  of  vertices  n,  as  n  approaches  infinity. 
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1.  INTRODUCTION 

The  distance  matrix  D  of  (molecular)  graphs  has  received  considerable  attention  in 

mathematical  chemistry  and  has  been  well  studied.^’ ^ The  elements  of  the  matrix  are 
the  distances  d^j  ,  where  stands  for  the  number  of  edges  on  the  shortest  path  between 
vertices  i  and  j.  The  sum  of  the  elements  above  the  main  diagonal  in  D  gives  the  Wiener 
number,^’ ^  a  well-known  graph  invariant  of  interest  in  structure-property  correlations. 
The  detour  matrix  DD  of  graphs,  although  suggested  in  the  mathematical  literature 
some  time  ago,^  has  only  recently  received  attention  in  mathematical  chemistry  The 
elements  (PD).j  of  the  detour  matrix  are  the  length  of  the  longest  path  between  the 

vertices  i  and  j  considered.  It  is  interesting  to  observe  that  different  graphs  can  have 
identical  detour  matrix  (when  vertices  are  suitably  labelled),  the  first  instance  of  the 
situation  in  graph  theory  that  non-isomorphic  graphs  are  represented  by  an  identical 
matrix.^  ^ 

1.1  DIDD  Matrix 

Construction  of  novel  matrices  and  novel  graph  invariants  by  using  quotients  of 
matrix  elements  of  two  different  matrices  or  two  different  invariants,  respectively,  has 
been  introduced  in  chemical  graph  theory  only  relatively  recently. An  example  is 
the  matrix  whose  (i,j)  elements  are  obtained  as  a  quotient  between  Euclidean  and 
topological  (graph  theoretical)  distance  between  vertices  i  and  j.  In  this  case  the  leading 
eigenvalue  apparently  offers  a  measure  of  folding  (bending)  of  the  long  chain. 

Recently,  a  new  graph  matrix,  the  D/DD  matrix,  has  been  introduced Its 
diagonal  elements  are  by  definition  equal  to  zero  and  off  diagonal  elements  are  given  as 
a  quotient  of  the  corresponding  elements  of  the  distance  matrix  D  and  the  detour  matrix 
DD.  Note,  although  the  DD  matrix  can  be  identical  for  non-isomorphic  graphs,  the 
distance  matrix  D  is  different  for  them  and  consequently  the  DIDD  matrix  has  to  be 
different  for  them  as  well.  It  has  been  suggested  that  the  DIDD  matrix  might  offer  a 
novel  characterization  of  cyclic  structures.^^  In  particular  its  leading  eigenvalue  has 
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been  put  forward  as  an  index  of  cyclicity  of  a  graph. 

Cyclicity  as  well  as  branching  are  concepts  that  have  been  widely  used  in  chemistry 
in  a  qualitative  fashion.  Attempts  to  assign  to  such  concepts  a  numerical  magnitude 
have  resulted  in  different  definitions  for  these  quantities  that  are  difficult  to  define.  In  a 
way,  the  prevailing  definition  is  one  that  has  been  found  useful,  either  in  the 
characterization  of  a  model  or  because  it  leads  to  a  further  development  of  the  field.  To 
illustrate  the  aforesaid  let  us  recall  of  alternative  generalizations  of  the  Wiener  number, 
the  well-known  graph  invariant  which  was  defined  initially  only  for  trees,  to  cycle- 
containing  systems. Similarly  any  definition  of  cyclicity  will  be  to  a  degree 
arbitrary,  hence  the  intention  is  to  come  up  with  a  definition  that  will  quantify  cyclicity 
and  involve  as  little  as  possible  non  structural  arbitrary  choices.  In  order  to  achieve  such 
a  goal  we  should  better  understand  the  cyclicity  of  simple  structures  first.  Having  that  in 
mind  we  had  decided  to  examine  more  closely  characterization  of  monocyclic  systems. 
Therefore,  we  examined  how  the  leading  eigenvalue  of  D/DD  varies  with  n,  the  size  of 
the  cycle  graphs,  Cn,  representing  simple  monocyclic  structures. 

2.  MONOCYCLIC  SYSTEMS 

Small  monocyclic  systems  C„  ,  n  =  3  -  8  are  illustrated  in  Figure  1.  Cycle  graphs, 
Cn ,  are  vertex  and  edge  transitive^  leading  to  adjacency  matrices  in  which  each  row 
can  be  obtained  from  the  first  row  by  a  cyclic  permutation.  The  same  applies  to  the 
distance  and  detour  matrices  of  €„  and  consequently  to  the  D/DD  matrix  of  C„  as  well. 
Hence,  cycle  graphs  have  simple  D/DD  matrices.  In  each  row  of  D/DD  of  an  odd  cycle 
Cik+l  every  element  appears  twice  except  the  diagonal  element  (Table  1).  In  case  of 
even  cycles  C2k+2  in  each  row  of  D/DD  the  largest  element  and  the  diagonal  element 
appear  once  whereas  the  remaining  elements  appear  twice  (Table  2).  As  it  is  known 
from  linear  algebra^'^  the  leading  eigenvalue  of  a  symmetric  matrix  is  bounded  from 
above  and  from  below  by  its  largest  and  smallest  row  sum,  respectively. 
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In  case  of  vertex  transitive  graphs,  the  two  sums  are  equal,  hence  for  Cn  the  leading 


eigenvalue  of  D/DD  equals  any  row  sum.  Because  the  elements  of  the  DIDD  matrices 
of  monocyclic  systems  have  such  a  simple  structure  it  is  not  difficult  to  write  down  the 
explicit  expressions  for  their  row  sums,  Un  : 


1^1+2 


if  n  odd 


if  n  even 


(1) 


In  Table  3  we  have  listed  the  values  of  the  row  sums  of  Cn  for  smaller  values  of  n.  It 
is  easy  to  see  that  if  n  increases  to  infinity  then  the  row  sums  also  increase 
indefinitely,  even  though  the  increments  in  each  step  are  getting  smaller.  In  this  respect 
the  Un  sums  are  reminiscent  of  the  harmonic  series  which  is  divergent,  but  at  very  slow 
rate.  As  it  is  known,  the  difference  between  the  harmonic  series  and  the  logarithmic 
function  of  n  leads  to  the  Euler  constant  7(7=  0.5772...).  The  sequences  built  from  the 
non  zero  elements  of  the  first  row  of  D/DD  matrices  may  be  also  of  some  interest  in 
mathematics.  Consider,  for  example,  first  few  non  zero  matrix  elements  of  DIDD  for 
the  cycle  C\  \  (shown  in  Table  1)  and  C12  (shown  in  Table  2): 

J_2345  L  L  1  ^  ^ 

10  ’9  ’8  ’7  ’6  ’  11  ’10  ’9  ’8  ’7  ’6  ■ 

We  can  observe  some  resemblance  of  these  sequences  with  the  corresponding  harmonic 
sub-sequence 

10  ’9  ’8  ’7  ’6  ’5  ’4  ’3  ’2  ■ 

In  the  next  section  we  consider  the  asymptotic  behavior  of  Un- 
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3 .  ON  THE  CONVERGENCE  OF  NORMALIZED  ROW  SUMS 


The  cyclicity  constant  Wn  has  been  defined  in  this  journal,  ref.  23,  as  the 
normalized  row  sum  Wn  =  Un  In.  The  denominator  makes  the  corresponding  series 
convergent.  Table  4,  and  in  particular  its  lower  part,  indicates  that  the  normalized  row 
sum  converges  as  n-^oo,  and  that  the  convergence  is  slow.  For  instance,  for  «  =  20  two 
digits,  for  n  =  100  three  digits,  and  for  n  =  25000  only  eight  digits  of  the  limiting  value 
W  when  n-^oo  are  reproduced.  Moreover,  the  normalized  row  sums  W„for  odd  n  and 
even  n  give  a  lower  and  an  upper  bound  of  W,  respectively. 

To  calculate  W  with  the  aforementioned  accuracy  demands  a  lot  of  computations 
owing  to  the  slow  convergence  of  Hence,  the  question  arises  can  the  limit  be 
calculated  analytically  to  even  higher  accuracy.  The  answer  is  affirmative  since  we  have 
found  out  an  expression  making  this  possible.  Its  derivation  shall  be  stated  here. 

Let  us  define 


m 

-  i.'V  _ 

"  n  2Ll  n 

k=\ 


Using  Sm,n  one  can  write: 


r  {ti 
2 

^  ^  =  2S„.i  ,  for  odd  n 

^  n-k 


n 


K=< 


it=l 


1  IV  _ 

n"^  n  Zu  n 


^  =4- +  25. 


-k  n 


,  for  even  n 


k=l 

Sm,n  can  be  related  to  harmonic  series: 

m 


(2) 


(3) 


m  m  m  n-m 

c  _ LV  +  n  m.V  1  m  V  1 

Vn  n  n-k  -n  Zj  n-k  -'H^jLln-k  7 

k=l  /:=!  k=l  t=n-l 


(4) 
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•  and  by  shifting  the  summation  index  it  can  be  transformed  into: 

n-l 

7  =  -  m-  1)} 

t=n-m 


where  H(n)  is  the  sum  of  the  initial  n  terms  of  the  harmonic  series: 


//(n)  =  l  +  -+-+---+i 
2  3  /I 


(5) 


(6) 


Sm.n  has  been  represented  above  as  a  telescopic  sum  in  which  the  corresponding 
members j)f  the  two  harmonic  series  that  diverge  will  cancel  each  other.  Formally,  we 
can  introduce  log  n  function  in  order  to  convert  the  divergent  harmonic  sequences  to 
convergent  sequences: 

^m,n  “  ■  - 1)  -  log(n  - 1) }  -  {H(n  -  m  - 1)  +  log(n  -  m  -  1)} 

+  { log(n  -  1)  -  log(n  -m-1)}  (7) 

It  is  well-known  that 

lim  [(»(«) -log«)]=T,  (8) 

where  y=  0.5772156649...  is  the  Euler  constant. 

For  large  value  of  n  the  cyclicity  constant  is  approximately  equal  to  2Sn/2/i ,  being 
given  by  the  expression 

^«/2.  n=-^+  {H(n  - 1)  -  log(n  - 1) }  -  {H(n/2  -  1)  +  log(n/2  - 1)> 

+  { log(n  - 1)  -  log(n/2  - 1) }  . 

Finally,  we  obtain  the  sought-after  limit  W  in  a  closed  form: 

^  =  2  (-1/2  +  Y  -  Y+{log(n  - 1)  -  log(«/2  - 1)}) 

^  ^  ^  =  0.38629436112.... 


(10) 
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The  advantage  of  this  analytical  result  for  the  cyclicity  constant  of  cycle  graphs  Cn 
as  n  tends  to  infinity  is  apparent.  For  instance,  in  computations  for  Table  4  we  needed 
2000  terms  to  obtain  accuracy  of  about  one  part  per  million,  because  of  the  slow 
convergence  of  the  original  series.  However,  if  we  use  the  analytical  expression  it  is 
possible  to  obtain  additional  significant  digits  of  W  without  difficulty.  As  a 
mathematical  curiosity  we  note  that  the  Euler  constant  y  does  not  appear  in  the 
expression  for  W  contrary  to  what  one  would  expect. 

4.  ON  THE  INTERPRETATION  OF  THE  CYCLICITY  MEASURE  Wn 

In  Table  4  we  listed  the  difference  between  Wn  for  adjacent  n  values.  As  n  increases 
the  difference  is  decreasing  and  tends  to  zero.  In  order  to  arrive  at  an  interpretation  of 
Wn  we  have  to  consider  what  other  structural  elements  of  C„  approach  zero  as  n  tends  to 
infinity.  The  curvature,  which  is  in  case  of  circle  given  by  1/R,  tends  to  zero  as  R 
increases,  i.e.,  as  finite  segments  of  circle  approach  line.  Wn  is  independent  of  the 
geometrical  scale,  thus  it  cannot  have  relation  to  the  curvature.  We  may,  however, 
consider  a  discrete  analogue  of  curvature  defined  by  the  angle  0„  between  the  sides  of  a 
regular  n-gon.  In  contrast  to  the  concept  of  curvature  in  geometry  which  is  scale- 
dependent  (see  Figure  2)  now  curvature  is  scale-independent.  Thus,  for  example, 
curvature  of  all  hexagons  of  Figure  2  is  constant,  while  that  of  concentric  circles 
decreases  as  R  increases. 

Discrete  curvature  is  a  measure  of  departure  of  an  n-gon  from  circle.  Clearly,  as  n 
increases  the  difference  between  n-gon  and  circle  decreases  (which  has  historically  been 
the  basis  for  the  early  calculation  of  7C ).  Thus  we  can  take  the  difference  Wn  -W  (given 
in  Table  5)  as  a  measure  of  "smoothness"  of  discretized  circles. 

An  alternative  approach  is  to  consider  instead  of  Wn  the  quantity  W^  In  (see  Table 

5).  This  quantity  has  an  advantage  over  Wn  ,  and  the  difference  Wn  -W  that  it  does  not 
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alternate  with  even/odd  parity  changes,  but  as  we  see  from  Table  5  it  monotonically 
decreases  as  n—^oo.  Clearly  W^/ncan  be  taken  as  a  measure  of  the  "smoothness  "  of 

discretized  circles. 

The  above  interpretation  answers  a  number  of  questions  that  could  be  raised  when 
considering  numerical  characterization  of  cyclic  structures.  It  is  clear  now  why  the 
convergence  of  is  so  important.  It  is  not  merely  a  matter  of  computation,  but  the 
approach  offers  a  basis  for  measuring  the  "smoothness"  of  discrete  curves.  We  hope  that 
more  light  will  be  brought  on  characterization  of  cyclicity  by  extending  the  present 
work  to  polycyclic  systems. 
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HGURE  CAPTION 


Figure  1.  Small  monocyclic  graphs  Crt,n  =  3,4,5,6,7,8. 
Figure  2.  Continuos  and  discretized  circles  of  increasing  radius. 


Table  1.  The  first  row  of  D/DD  matrices  for  monocycles  ,  n  odd.  Other  rows  are  obtained  by 
cyclic  permutation 


Table  2.  The  first  row  of  DIDD  matrices  for  monocycles  ,  n  even.  Other  rows  are  obtained  by  cyclic 
permutation 


Table  3.  The  row  sums  U  for  smaller  monocycles  C 
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Table  4.  The  normalized  row  sums  for  monocycles  of  increasing  size  n,  computed  by  Mathematical 
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Table  5.  Alternative  measures  of  "smoothness"  of  descretized  n-circ!es 
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W-W 

n 

WJn 

3 

-  0.05296102779 

0.11111111111 

4 

0.03037230555 

0.10416666667 

5 

-  0.01962769445 

0.07333333333 

6 

0.01370563888 

0.06666666667 

7 

-  0.01010388493 

0.05374149660 

8 

0.00775325838 

0.04925595238 

9 

-  0.00613563096 

0.04223985891 

10 

0.00497548015 

0.03912698413 

11 

-0.00411542894 

0.03474353929 

12 

0.00346032864 

0.03247955748 

19 

-0.00138313372 

0.02025848565 

20 

0.00124844523 

0.01937714032 

29 

-  0.00059417769 

0.01330000633 

30 

0.00055524760 

0.01289498696 

39 

-0.00032862318 

0.00989655738 

40 

0.00031240247 

0.00966516909 

99 

-0.00005101260 

0.00390144796 

100 

0.00004999750 

0.00386344359 

499 

-  0.00000200802 

0.00077413297 

500 

0.00000200000 

0.00077259272 

999 

-  0.00000050100 

0.00038668054 

1000 

0.00000050000 

0.00038629486 

1999 

-0.00000012512 

0.00019324374 

2000 

0.00000012500 

0.00019314724 

4999 

-  0.00000002001 

0.00007727432 

5000 

0.00000002000 

0.00007725888 

9999 

-  0.00000000500 

0.00003863330 

10000 

0.00000000500 

0.00003862944 

24999 

-  0.00000000080 

0.00001545239 

25000 

0.00000000080 

0.00001545177 

